1. Weight Decade
The red point is the real y value. If w fluctuates rapidly, the blue line is the result, which will be overfitting; if the change of w is limited, it is the green line.
The intersection of the yellow circle and the green circle is a balance point. If w goes down, the reduced term in l yellow circle is insufficient to make up for the increased term in the rate circle. In general, w goes to the origin, and w* becomes smaller, so the complexity of the model becomes lower.
Weight decade general choice, 0.001, 0.01, 0.1 keep changing
2. The code is implemented from scratch
The simpler the data and the more complex the model, the easier it is to overfit.
3. Introduction to implementing pytorch
refer to
https://www.bilibili.com/video/BV1UK4y1o7dy?p=1