视频学习:
https://www.bilibili.com/video/BV1xW411Y7Qd?p=4 系列
1、MultiLabelBinarizer 多标签编码
多标签二值化:sklearn.preprocessing.MultiLabelBinarizer(classes=None, spare_output=False)
classes属性,若设置classes参数时,其值等于classes参数值,否则从训练集统计标签值。
例一 y为标签id的list
from sklearn.preprocessing import MultiLabelBinarizer
def main():
y = [[2,3,4],[2],[0,1,3],[0,1,2,3,4],[0,1,2]] # 同时含有单标签和多标签
m = MultiLabelBinarizer()
print(m.fit_transform(y))
print(m.classes_)
if __name__ == '__main__':
main()
'''
[[0 0 1 1 1]
[0 0 1 0 0]
[1 1 0 1 0]
[1 1 1 1 1]
[1 1 1 0 0]]
[0 1 2 3 4]
'''
例二 y为标签本身的list
from sklearn.preprocessing import MultiLabelBinarizer
def main():
y = [['2','3','4'],['2'],['0','1','3'],['0','1','2','3','4'],['0','1','2']]
m = MultiLabelBinarizer()
print(m.fit_transform(y))
print(m.classes_)
if __name__ == '__main__':
main()
'''
[[0 0 1 1 1]
[0 0 1 0 0]
[1 1 0 1 0]
[1 1 1 1 1]
[1 1 1 0 0]]
['0' '1' '2' '3' '4']
'''
例三 设置classes即规定了标签(非classes中的标签不被记录)
classes是设置label位置的,若classes=[2,3,4,5,6,1],则2值索引为0,3索引为1
def main():
y = [['2','3','4'],['2'],['0','1','3'],['0','1','2','3','4'],['0','1','2']]
m = MultiLabelBinarizer(classes=['2','3','4','5','6','1'])
print(m.fit_transform(y))
print(m.classes_)
'''
[[1 1 1 0 0 0]
[1 0 0 0 0 0]
[0 1 0 0 0 1]
[1 1 1 0 0 1]
[1 0 0 0 0 1]]
['2' '3' '4' '5' '6' '1']
'''
例四 m.transform()使用
对不同情况的总结,数据y完成了fit,而后直接用transform对新的数据a进行transform。数据a只有标签'3','4',因此对应为[[0 0 0 1 1]]
def main():
y = [['2','3','4'],['2'],['0','1','3'],['0','1','2','3','4'],['0','1','2']]
m = MultiLabelBinarizer()
m.fit(y)
a = [['3','4']]
print(m.transform(a))
'''
[[0 0 0 1 1]]
'''