The role of weight decay (L2 regularization)

foreword

1. The role of weight decay (L2 regularization)

Function: Weight decay (L2 regularization) can avoid the problem of model overfitting.
Thinking: The L2 regularization term has the effect of making w smaller, but why can smaller w prevent overfitting?
Principle: (1) Explain from the complexity of the model: a smaller weight w, in a sense, means that the complexity of the network is lower and the fit to the data is better (this rule is also called Occam Razor), and in practical applications, it has also been verified that the effect of L2 regularization is often better than that without regularization. (2) Mathematical explanation: When overfitting, the coefficient of the fitting function is often very large, why? As shown in the figure below, overfitting means that the fitting function needs to consider every point, and the final fitting function fluctuates greatly. In some small intervals, the value of the function changes drastically. This means that the derivative value (absolute value) of the function in some small intervals is very large. Since the value of the independent variable can be large or small, only when the coefficient is large enough can the derivative value be large. Regularization is to constrain the norm of the parameter so that it is not too large, so it can reduce overfitting to a certain extent.

Note that only W is punished, b is not punished;
insert image description here
w update process:
insert image description here
if not punished, the previous coefficient of W is 1;

Because η, λ, and n are all positive, 1-ηλ/n is less than 1, and its effect is to reduce w, which is the origin of weight decay. Of course, considering the subsequent derivative terms, the final value of w may increase or decrease. The larger the lambd, the more suppressed the fit of the model.

**Summary:** On the whole, comparing the loss and accuracy information of the training output between the model with and without regularization, we can find that after adding regularization, the rate of loss decline will slow down, and the accuracy rate will decrease. Accuracy's rising speed will slow down, and the loss and Accuracy of the regularized model without adding the floating ratio are relatively large (or the variance is relatively large), while adding the regularized model to train the loss and Accuracy, the performance is relatively smooth. And as the regularized weight lambda is larger, the performance is smoother. This is actually the penalty effect of regularization on the model. Through regularization, the performance of the model can be made smoother, that is, through regularization, the problem of model overfitting can be effectively solved.

Guess you like

Origin blog.csdn.net/PETERPARKERRR/article/details/122839076