方法一、用 Scikit-Learn 实现 One-Hot Encoding
scikit-learn 的 LabelBinarizer
函数(二值化)可以很方便地把你的目标(labels)转化成独热编码向量。请看:
import numpy as np
from sklearn import preprocessing
# Example labels 示例 labels
labels = np.array([1,5,3,2,1,4,2,1,3])
# Create the encoder 创建编码器
lb = preprocessing.LabelBinarizer()
# Here the encoder finds the classes and assigns one-hot vectors
# 编码器找到类别并分配 one-hot 向量
lb.fit(labels)
# And finally, transform the labels into one-hot encoded vectors
# 最后把目标(lables)转换成独热编码的(one-hot encoded)向量
lb.transform(labels)
array([[1, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0]])
方法二、使用Sklearn.Preprocessing 的 OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
encoder.fit(np.arange(6).reshape(-1, 1))
def one_hot_encode(x):
return encoder.transform(np.array(x).reshape(-1, 1)).toarray()
labels = [1,5,3,2,1,4,2,1,3]
a= one_hot_encode(labels)
print(a)
[[0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1.]
[0. 0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0.]
[0. 0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0.]]
注意这里存在一个问题:当原标签不是从0开始,待续!