1:Regression-Case Study
为什么在Loss function中,只考虑对w的正则化,而不考虑对b的正则化?
因为b是一条水平线,b对Loss function是否平滑几乎不产生影响。
1-Regression Demo
Ada-Gradient时会详细讲解这个技巧:小的learning rate导致要很多次迭代才能达到最优解,大的learning rate有可能会有巨幅震荡,也无法达到最优解。有一个调参的技巧,就是对w和b克制化的learning rate。
lr = 1
....................................
lr_b = 0
lr_w = 0
....................................
lr_b = lr_b + b_grad ** 2
lr_w = lr_w + w_grad ** 2
.................................
# update parameters.
b = b - lr/np.sqrt(lr_b)* b_grad
w = w- lr/np.sqrt(lr_w)* w_grad
2:Where does the error come from?
error due to “bias” and error due to “variance”。
简单的model(model set比较小,这个小的model set可能根本不包含真实的target model),bias大,variance小;
复杂的model(model set比较大,这个大的model set可能就包含真实的target model),bias小,variance大。
如果error来自于variance很大,那么就是overfitting;
如果error来自于bias很大,那么就是underfitting;
What to do with large bias?
1、Diagnosis:
(1) If your model cannot even fit the training examples, then you have large bias.----> Underfitting.
(2) If you can fit the training data, but large error on testing data, then you probably have large variance. ----> Overfitting.
2、For bias, redesign your model:
(1) Add more features as input;
(2) A more complex model
What to do with large variance?
1、 More data(very effective, but not always practical)可以自己做训练数据,例如翻转、加噪声等。
2、 Regularization (希望参数变化较小,曲线变平滑),但是可能会使你的model set 不包含target model,可能会伤害bias。