sklearn 的 Normalizer的L1和 L2

Normalizer 正则化，跟z-score，对数转换，指数转换这种数据转换方式不同。
L1 norm 是指对每个样本的每一个元素都除以该样本的L1范数.
L2 norm 是指对每个样本的每一个元素都除以该样本的L2范数.
bag of words features need to normalize with L1 norm
fisher vector features need to normalize with L2 norm

a = np.array([[10,4,5,2], [1,4,5,7]])
from sklearn.preprocessing import Normalizer
norm1 = Normalizer(norm='l1')
>>>norm1.fit_transform(a)
array([[ 0.47619048,  0.19047619,  0.23809524,  0.0952381 ],
       [ 0.05882353,  0.23529412,  0.29411765,  0.41176471]])
比如a[0][0] = 10/(10+4+5+2) = 0.476.., 所以每一行的和为1。
>>> norm2 = Normalizer(norm='l2')
>>> norm2.fit_transform(a)
array([[ 0.8304548 ,  0.33218192,  0.4152274 ,  0.16609096],
       [ 0.10482848,  0.41931393,  0.52414242,  0.73379939]])
比如 10/np.sqrt(100+16+25+4)＝0.830....

另外 sklearn中还有一种常用的normlize的方法z-score，也就是我们常用的减去样本均值除以样本方差。

from sklearn import preprocessing 
a = np.array([[10,4,5,2], [1,4,5,7]], dtype=float)
scaler = preprocessing.StandardScaler().fit(a)
scaler.mean_
返回： array([ 5.5,  4. ,  5. ,  4.5])
scaler.var_ #方差
返回： array([ 20.25,   0.  ,   0.  ,   6.25])
scaler.scale_ ＃标准差， 如果方差为0则为1
返回：array([ 4.5,  1. ,  1. ,  2.5])
scaler.transform(a)
返回：array([[ 1.,  0.,  0., -1.],
       [-1.,  0.,  0.,  1.]])

注意这个地方的均值和方差是按列算的，上面的L1和L2范数是样本内的操作，而这里 z-score 是样本间的，将每个样本的第一个变量之间取均值，除以方差。

sklearn 的 Normalizer的L1和 L2

猜你喜欢