本文已参与「新人创作礼」活动,一起开启掘金创作之路。
Adding regularization will often help To prevent overfitting problem (high variance problem ).
1. Logistic regression
回忆一下训练时的优化目标函数
w,bminJ(w,b), w∈Rnx,b∈R(1-1)
其中
J(w,b)=m1i=1∑mL(y^(i),y(i))(1-2)
L2 regularization (most commonly used):
其中
Why do we regularize just the parameter w? Because w Is usually a high dimensional parameter vector while b is A scalar. Almost all The parameters are in w rather than b.
L1 regularization
J(w,b)=m1i=1∑mL(y^(i),y(i))+mλ∣w∣1(1-5)
其中
∣w∣1=j∑nx∣wj∣(1-6)
w will end up being sparse. In other words the w vector will have a lot of zeros in it. This can help with compressing the model a little.
2. Neural network "Frobenius norm"
其中
∥∥∥∥w[l]∥∥∥∥F2=i∑n[l−1]j∑n[l](wij)2(2-2)
L2 regulation is also called Weight decay:
dw[l]wl:=(from backprop)+mλw[l]=w[l]−αdw[l]=(1−mαλ)w[l]−α(from backprop)(2-3)
能够防止权重
w过大,从而避免过拟合
3. inverted dropout
对于不同的训练样本都可以随机消除一部分结点
反向随机失活(前向和后向都需要dropout):
d3a3a3/z[4]z[4]/=np.random.rand(a3.shape[0],a3.shape[1])<keep.prob=np.multiply(a3,d3) #a3∗d3,element wise multiplication=keep.prob #in order to not reduce the expected value of a3 inverted dropout=w[4]a[3]+b[4]=keep.prob(3-1)
this inverted dropout technique by dividing by the keep.prob, it ensures that the expected value of a3 remains the same. This makes test time easier because you have less of a scaling problem. 测试时不需要使用drop out