所谓的正则效果就是:
数学上具备修补项的某些特性。
讲人话,到底什么是正则化?
就是让我们本科时学过的拉格朗日法求极值得到的解集具有某些特征。
L1:(拉普拉斯分布的指数项)
结果会比较稀疏(接近0,或者很大),
好处是更快的特征学习,让很多W为0
但是正则效果可能不太明显;
L2:(高斯分布的指数项)
L2对于不重要的特征会减小W,但是不会为0
我们应该如何选择L1还是L2?
一般是根据先验分布的不同选择不同的正则化项(其实高斯分布和拉普拉斯分布长得差不多)
Google的说法是:
L1 regularization can’t help with multicollinearity.
L2 regularization can’t help with feature selection.
讲人话:
当你想要抽取规则的时候,L1优先
当你想要特征之间进行线性组合的时候,L2优先
为什么L1具有稀疏性?
这个东西网上几乎没有博客是讲清楚的,
还记得本科时学的拉格朗日不?
这里使用
https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models
中的一个图来说明:
上面图中的椭圆就是未经正则化的原loss函数,
绿色的就是约束,解最终在绿色的区域的边上产生。
上面有个地方没有讲准确,就是,
这里其实使用的是“广义拉格朗日”(处理不等式约束),
本科时我们学过的是“狭义的拉格朗日(处理等式约束)”
所以L2能产生稀疏解不?也可以,但是概率比较小,因为约束是一个圆圈嘛。
好了,扯了这么多,代码呢?
代码可以使用《python深度学习》第四章的第三个实验
神经网络结构是10000X16X16X1
为了快速出结果,设置epochs=1
L1正则时的权重输出如下:
输出权重 [array([[-4.27828636e-05, -1.00246782e-03, -2.79264990e-04, ...,
-3.85033316e-04, -3.40257306e-04, -3.55066732e-08],
[ 2.15981118e-02, 3.79165774e-03, -6.72453083e-03, ...,
2.53116563e-02, 4.29332331e-02, -1.85270631e-03],
[ 2.85863448e-02, 1.66764148e-02, -6.34254003e-03, ...,
1.40961567e-02, 2.53925007e-02, 1.04373496e-04],
...,
[-3.83841514e-04, 5.83783374e-04, -6.23644795e-04, ...,
-1.19826291e-04, 1.29003369e-04, -5.33740851e-04],
[-6.40181359e-04, 6.27052214e-04, -6.12081552e-04, ...,
-9.73617774e-04, -3.70911177e-04, 1.00261578e-03],
[-6.49964553e-04, 2.80193461e-04, -4.07341809e-04, ...,
9.82345082e-04, -7.55024375e-04, 7.67573947e-05]], dtype=float32), array([ 0.01352753, 0.00530624, 0.01403685, -0.00621689, 0.00346374,
0.01224227, -0.01186973, 0.00608102, 0.00767745, 0.02525727,
0.00554247, 0.00680919, 0.00823556, 0.02523253, 0.02550968,
-0.00733239], dtype=float32), array([[ 2.97359854e-01, 6.56183460e-04, -4.73043948e-01,
-1.17567085e-01, -3.42536233e-02, -3.03927213e-01,
4.69646633e-01, -3.84592921e-01, 1.52946264e-01,
-1.82628393e-01, 3.07190239e-01, 1.88732699e-01,
-3.68719488e-01, -3.30251426e-01, -3.66007872e-02,
4.12766099e-01],
[ 1.22636884e-01, 9.78616104e-02, -2.44927496e-01,
-3.78500260e-02, -3.29815060e-01, 4.54631686e-01,
1.32869394e-03, -2.15873808e-01, 1.01626828e-01,
-1.52611211e-01, -3.60170454e-01, -3.46550457e-02,
3.55746113e-02, 3.10409755e-01, 3.07094425e-01,
-3.89622569e-01],
[ 4.18785542e-01, 6.49755746e-02, 2.65271336e-01,
-1.81596532e-01, -2.55371511e-01, -8.37184638e-02,
4.29974437e-01, -1.55283764e-01, -3.39388162e-01,
-3.18841726e-01, -4.97105066e-03, -2.07916439e-01,
-1.47543848e-01, -8.37940574e-02, 3.37905467e-01,
3.27208400e-01],
[ 1.04914896e-01, -3.00677449e-01, 2.32164890e-01,
1.62189871e-01, -1.11904912e-01, -1.14806369e-02,
-3.23227465e-01, -1.23150116e-02, 2.32810229e-01,
2.10369080e-01, 1.51899308e-01, 2.40044445e-01,
1.14793181e-01, 5.89926494e-04, -8.19776803e-02,
7.19810778e-04],
[-3.87102336e-01, 3.51326197e-01, 9.02353227e-02,
-2.63564795e-01, -3.27613801e-01, -2.86400300e-02,
-1.87998384e-01, 3.43739748e-01, 2.73346812e-01,
2.66616821e-01, 3.51429433e-02, 4.56109941e-01,
5.48761450e-02, -3.60661447e-01, -3.88115913e-01,
2.51187414e-01],
[-2.37962544e-01, 1.66401789e-01, -3.98593396e-01,
1.65419161e-01, 3.33086133e-01, 4.77736555e-02,
2.00005323e-01, -2.52376407e-01, -2.90598810e-01,
-1.85996607e-01, -2.25491524e-02, 1.13793194e-01,
1.65100321e-01, -6.65912463e-04, 5.77541031e-02,
-3.25353086e-01],
[-1.40810832e-01, -2.48465851e-01, -1.19345643e-01,
-8.56471481e-04, -2.67849237e-01, -1.44852057e-01,
-9.15314704e-02, 1.34784952e-01, -1.29481718e-01,
-1.04500920e-01, -1.77888229e-01, -1.47721738e-01,
-2.19401658e-01, -2.23744530e-02, -2.98361719e-01,
1.45486742e-01],
[ 1.69071302e-01, -3.72374713e-01, 2.83467352e-01,
-1.03206985e-01, 3.67821902e-01, -1.43115878e-01,
1.25592351e-01, -3.89090292e-02, -2.01085940e-01,
1.77833766e-01, -2.91119248e-01, 3.61348659e-01,
-3.43382619e-02, -3.96245480e-01, 3.98543626e-01,
4.63600516e-01],
[ 4.63620007e-01, -2.45612651e-01, 3.48520666e-01,
1.46613419e-01, 1.65358827e-01, -2.95230269e-01,
4.20761257e-01, -2.00932339e-01, -1.33652672e-01,
2.92670336e-02, -1.22803524e-01, 2.40687251e-01,
3.18130404e-01, -6.91166497e-04, -3.25402856e-01,
-1.30906135e-01],
[ 1.62128076e-01, -1.19411573e-01, 3.45981359e-01,
-3.86496191e-04, 4.05329019e-01, 1.49058387e-01,
4.43916738e-01, 5.18011861e-02, -3.05147499e-01,
-3.65549386e-01, -2.54479855e-01, -1.22571457e-02,
1.56393483e-01, 5.07648513e-02, 2.26654470e-01,
-3.36109191e-01],
[-3.12535584e-01, -2.30290424e-02, 6.98565692e-02,
-1.50468856e-01, -2.78825819e-01, 9.92865711e-02,
-3.34635884e-01, 3.57187033e-01, 2.54794866e-01,
1.91722021e-01, 5.36262877e-02, 1.83799900e-02,
-5.85136586e-04, 3.57504547e-01, -2.61918098e-01,
2.01858550e-01],
[ 1.80302829e-01, 3.65201116e-01, 2.03263357e-01,
1.17282532e-01, 1.65266767e-01, -4.04994518e-01,
-3.51655126e-01, 3.97830069e-01, -7.66607746e-02,
-9.62971300e-02, 1.73393369e-01, -2.00297937e-01,
7.74533255e-04, -1.40481442e-01, 2.14320533e-02,
3.77951324e-01],
[-2.46189404e-02, 2.93494880e-01, 3.59376967e-01,
3.20476014e-04, 3.01101089e-01, 3.21090758e-01,
-3.75274122e-01, 9.95393726e-04, 2.46108666e-01,
2.64105260e-01, -1.19236402e-01, 3.77319247e-01,
6.48521120e-04, 3.39984924e-01, 2.55425870e-01,
2.54246205e-01],
[ 5.12674786e-02, -1.02096912e-03, -3.70046735e-01,
-3.52790147e-01, -1.98903963e-01, -1.82327494e-01,
3.54469061e-01, 1.71051875e-01, 3.73468578e-01,
1.66834593e-01, 2.45054252e-07, 4.23564501e-02,
9.42573650e-04, -1.89804733e-02, 4.39227995e-04,
9.95820481e-03],
[ 2.57087111e-01, -2.79899389e-01, 2.73097128e-01,
-3.69274199e-01, 1.06317475e-01, 3.90571177e-01,
1.57478735e-01, 2.42957503e-01, 4.03050303e-01,
-3.74355882e-01, -2.04208896e-01, 1.89841297e-02,
-3.78889889e-01, 2.43642956e-01, 2.69247919e-01,
1.17503397e-01],
[ 1.01470791e-01, -2.11673021e-01, -1.81737795e-01,
3.25044870e-01, -1.60212040e-01, 2.00224802e-01,
5.87655418e-03, -3.10205370e-01, 2.09311340e-02,
-2.71605730e-01, -3.22293550e-01, 7.38748312e-02,
6.16738871e-02, -8.31133649e-02, -1.60038099e-02,
-7.09989516e-04]], dtype=float32), array([ 1.9303737e-02, -2.9504906e-02, -2.6506849e-02, 1.6427160e-03,
2.9263936e-02, 5.9134695e-03, 2.8158128e-02, 2.4705507e-02,
1.2207930e-02, -4.5786786e-05, 4.7801528e-05, -2.5754545e-02,
6.4422688e-03, 2.7185101e-02, 2.4097716e-02, 3.2040365e-02],
dtype=float32), array([[ 0.33899024],
[ 0.09035949],
[-0.40873846],
[ 0.44341677],
[ 0.55033386],
[-0.48756945],
[ 0.35685787],
[-0.34343976],
[-0.5151725 ],
[-0.15856893],
[ 0.01221188],
[-0.31893036],
[-0.2622632 ],
[-0.29004556],
[ 0.08608972],
[ 0.29274988]], dtype=float32), array([0.02345355], dtype=float32)]
我们可以看到,有很多个权重是e-4,也就是说小于0.1,
所以L1的稀疏性是什么意思呢,
不是网上说的很多权重为0,
而是很多权重接近0.
然后使用同样的代码,再进行L2正则化,然后看下输出的权重
输出权重 [array([[ 5.4510499e-19, -1.1375406e-14, 1.2810104e-09, ...,
6.6694220e-13, 1.7195195e-21, 1.1844387e-18],
[ 2.2681307e-02, 2.8639721e-02, -5.0795679e-03, ...,
3.2248314e-02, 3.5097659e-02, 1.9943751e-02],
[ 2.9682485e-02, 3.8339857e-02, -3.8643787e-03, ...,
-7.3956484e-03, 1.5394368e-02, 6.4378373e-02],
...,
[-2.2290610e-03, -5.2631828e-03, -1.1894613e-02, ...,
-3.2023021e-03, 9.8316949e-03, 4.5698625e-03],
[-6.3545518e-03, -2.8249058e-03, -1.4715969e-02, ...,
3.9712125e-03, -2.1713993e-03, 6.4099208e-05],
[ 4.3839724e-03, 7.1036047e-04, -6.1749844e-03, ...,
3.3779065e-03, 3.6998792e-04, -2.5457949e-03]], dtype=float32), array([0.01308027, 0.02083343, 0.01824512, 0.01143504, 0.00499259,
0.01486909, 0.01095271, 0.00404395, 0.02463059, 0.00872665,
0.00801992, 0.00815683, 0.01039271, 0.01561781, 0.01411563,
0.04340347], dtype=float32), array([[ 2.93407321e-01, -1.98400989e-02, -4.40114766e-01,
-1.11424305e-01, -7.31087476e-02, -3.04403722e-01,
4.64353442e-01, -3.29900682e-01, 1.63630053e-01,
-1.84131727e-01, 3.08276862e-01, 1.94891691e-01,
-4.41589683e-01, -3.05707157e-01, -4.41319868e-02,
4.10211772e-01],
[ 1.44314080e-01, 1.14107765e-01, -3.18847924e-01,
-8.83864984e-02, -2.84857243e-01, 4.43122834e-01,
3.62090170e-02, -2.15172529e-01, 9.76731330e-02,
-1.61776304e-01, -3.61122280e-01, -5.60393780e-02,
8.91952366e-02, 3.17961156e-01, 3.24503183e-01,
-3.71593475e-01],
[ 4.25948858e-01, 7.04302564e-02, 2.81940609e-01,
-1.77070782e-01, -2.74864286e-01, -1.06579565e-01,
4.36641574e-01, -1.12034686e-01, -3.45022917e-01,
-3.19837213e-01, -1.43970661e-02, -2.16923535e-01,
-2.31076464e-01, -8.27731341e-02, 3.60185146e-01,
3.36787492e-01],
[ 1.15281835e-01, -3.01896662e-01, 2.39346668e-01,
1.83167055e-01, -1.16130240e-01, -2.12356411e-02,
-3.89987141e-01, -2.43074540e-02, 3.24033946e-01,
2.12604478e-01, 1.53906882e-01, 3.26046437e-01,
2.10126624e-01, 8.62302035e-02, -1.64832115e-01,
1.51150580e-02],
[-3.90144706e-01, 3.52188319e-01, 2.51630321e-02,
-3.43495667e-01, -2.54216045e-01, -3.87258083e-02,
-1.94808662e-01, 2.56020427e-01, 2.74487942e-01,
2.68538356e-01, 4.05583121e-02, 4.54750240e-01,
1.47770867e-01, -3.62259477e-01, -3.83709610e-01,
2.63715029e-01],
[-2.85299212e-01, 1.69331729e-01, -3.38647544e-01,
2.34549761e-01, 2.80789793e-01, 8.91473368e-02,
1.77124396e-01, -2.13072211e-01, -2.61840492e-01,
-1.87434465e-01, -2.91305147e-02, 1.61795199e-01,
2.20224589e-01, 2.58004293e-02, 5.37811071e-02,
-3.69709074e-01],
[-1.43994346e-01, -2.49894559e-01, -2.04025045e-01,
9.80285257e-02, -2.77742773e-01, -2.26973757e-01,
-8.67364928e-02, 1.37876272e-01, -2.02594623e-01,
-1.06818587e-01, -1.79687336e-01, -2.55178958e-01,
-2.99385250e-01, -5.59285395e-02, -2.94983864e-01,
2.53982246e-01],
[ 1.84192955e-01, -3.73058200e-01, 2.97143936e-01,
-1.03991538e-01, 2.90101379e-01, -1.68928549e-01,
1.39721408e-01, -1.06037809e-02, -2.18328491e-01,
1.80070952e-01, -2.92274624e-01, 3.65887821e-01,
-1.29578754e-01, -3.81524444e-01, 3.92549694e-01,
4.74777371e-01],
[ 4.72008854e-01, -2.47345746e-01, 3.59616429e-01,
2.52750129e-01, 1.19450204e-01, -3.10099781e-01,
4.30306435e-01, -1.62968546e-01, -1.40832901e-01,
3.73812504e-02, -1.25185639e-01, 2.44938642e-01,
3.32485586e-01, 3.53299305e-02, -3.30613941e-01,
-1.35950983e-01],
[ 1.85852900e-01, -9.83352214e-02, 3.41978699e-01,
8.67165811e-03, 3.46845627e-01, 1.29153579e-01,
4.64443177e-01, 1.15139224e-02, -3.25556934e-01,
-3.66379201e-01, -2.55769938e-01, -2.08554789e-02,
1.51786834e-01, 5.86780272e-02, 2.55553305e-01,
-3.25718910e-01],
[-3.11689675e-01, -2.17617527e-02, 7.72571983e-03,
-2.31704265e-01, -2.14245647e-01, 9.61495414e-02,
-3.27444166e-01, 2.67990142e-01, 2.50948817e-01,
1.95652723e-01, 5.79224415e-02, 9.13931709e-03,
2.56390348e-02, 3.59594494e-01, -2.66508758e-01,
2.20254958e-01],
[ 1.94646284e-01, 3.66057843e-01, 2.14457333e-01,
2.24085152e-01, 1.05342574e-01, -4.27612185e-01,
-3.49920005e-01, 3.75871032e-01, -1.05346240e-01,
-9.78173241e-02, 1.75176308e-01, -2.12537095e-01,
-8.19739625e-02, -1.40039310e-01, 7.61785209e-02,
3.92508149e-01],
[-4.72818352e-02, 2.94093102e-01, 2.90557832e-01,
1.40494900e-04, 2.79832035e-01, 3.35276634e-01,
-3.79788160e-01, -7.71822548e-03, 2.58355170e-01,
2.66037405e-01, -1.21681616e-01, 3.90908629e-01,
6.70772269e-02, 3.55733871e-01, 2.57470012e-01,
2.55136728e-01],
[ 3.27138193e-02, -2.80077597e-06, -3.42442542e-01,
-4.03933018e-01, -1.91840082e-01, -1.61959320e-01,
3.39495540e-01, 1.39288634e-01, 4.23164964e-01,
1.70850694e-01, -1.47289789e-08, 9.98789370e-02,
7.75708482e-02, -7.56203895e-03, -3.06240357e-02,
-6.50095474e-03],
[ 2.32675731e-01, -2.35386729e-01, 2.33843103e-01,
-4.32860196e-01, 7.58191794e-02, 4.14948165e-01,
1.41167402e-01, 1.70569643e-01, 4.25751954e-01,
-3.75422835e-01, -2.05775797e-01, 6.12163655e-02,
-3.75813574e-01, 2.67257035e-01, 2.52756864e-01,
1.00201644e-01],
[ 1.61049694e-01, -2.18448177e-01, -2.63321877e-01,
4.01718348e-01, -1.76443890e-01, 2.42350683e-01,
7.09695518e-02, -3.05766135e-01, 6.77920505e-02,
-2.73409277e-01, -3.23337317e-01, 1.05839022e-01,
1.21519454e-01, -8.19710717e-02, -6.41441718e-02,
1.70101207e-02]], dtype=float32), array([ 0.01596233, -0.00421972, -0.0345025 , 0.03105383, 0.00776252,
我们可以看到,其实也有很多是e-2取值的,但是L2正则化的情况下,你基本看不到e-4的,所以说,
L1比L2“更容易”导致权重的稀疏性,
注意:
并非只有L1能导致稀疏性。