sklearn模块常用功能总结(20200915)

视频学习:

https://www.bilibili.com/video/BV1xW411Y7Qd?p=4 系列

1、MultiLabelBinarizer 多标签编码

多标签二值化:sklearn.preprocessing.MultiLabelBinarizer(classes=None, spare_output=False)

classes属性,若设置classes参数时,其值等于classes参数值,否则从训练集统计标签值

例一 y为标签id的list

from sklearn.preprocessing import MultiLabelBinarizer

def main():
	y = [[2,3,4],[2],[0,1,3],[0,1,2,3,4],[0,1,2]] # 同时含有单标签和多标签
	m = MultiLabelBinarizer()
	print(m.fit_transform(y))
	print(m.classes_)

if __name__ == '__main__':
    main()

'''
[[0 0 1 1 1]
 [0 0 1 0 0]
 [1 1 0 1 0]
 [1 1 1 1 1]
 [1 1 1 0 0]]
[0 1 2 3 4]
'''

例二 y为标签本身的list

from sklearn.preprocessing import MultiLabelBinarizer

def main():
	y = [['2','3','4'],['2'],['0','1','3'],['0','1','2','3','4'],['0','1','2']]
	m = MultiLabelBinarizer()
	print(m.fit_transform(y))
	print(m.classes_)


if __name__ == '__main__':
    main()

'''
[[0 0 1 1 1]
 [0 0 1 0 0]
 [1 1 0 1 0]
 [1 1 1 1 1]
 [1 1 1 0 0]]
['0' '1' '2' '3' '4']
'''

例三 设置classes即规定了标签(非classes中的标签不被记录)

classes是设置label位置的,若classes=[2,3,4,5,6,1],则2值索引为0,3索引为1

def main():
	y = [['2','3','4'],['2'],['0','1','3'],['0','1','2','3','4'],['0','1','2']]
	m = MultiLabelBinarizer(classes=['2','3','4','5','6','1'])
	print(m.fit_transform(y))
	print(m.classes_)
'''
[[1 1 1 0 0 0]
 [1 0 0 0 0 0]
 [0 1 0 0 0 1]
 [1 1 1 0 0 1]
 [1 0 0 0 0 1]]
['2' '3' '4' '5' '6' '1']
'''

例四 m.transform()使用

对不同情况的总结,数据y完成了fit,而后直接用transform对新的数据a进行transform。数据a只有标签'3','4',因此对应为[[0 0 0 1 1]]

def main():
	y = [['2','3','4'],['2'],['0','1','3'],['0','1','2','3','4'],['0','1','2']]
	m = MultiLabelBinarizer()
	m.fit(y)
	a = [['3','4']]
	print(m.transform(a))

'''
[[0 0 0 1 1]]
'''

猜你喜欢

转载自blog.csdn.net/caicai0001000/article/details/107593269