1. Softmax regression
2. Loss function
2.1 L2 Loss mean square loss function
Blue line: the changing loss function, when the predicted value of the change, this is a quadratic function y=0
Green line: likelihood function Orange line: the gradient of the loss function, the gradient is a linear functiony'
0.5 * y'^2
e^-l
y-y'
During gradient descent, we update our parameters according to the direction of the negative gradient, so its derivative determines how to update the parameters. When the actual value y is far away from the predicted value y', the (y-y')
change is relatively large; on the contrary, when it is close to the origin, the derivative is relatively small.
The disadvantage is that when we are far away from the dot, we do not necessarily need to change the gradient so much to update our parameters.
2.2 L1 Loss absolute value loss function
The blue line: the changing loss function, when the predicted value of the change, which is a quadratic function y=0
. The green line: the likelihood function , is a Gaussian distribution, but it is also steeper at 0 and has a peak. Orange line: the gradient of the loss function. When the predicted value is far from the true value, the gradient is an absolute value error, which is a constant; when the predicted value is relatively close to the true value, it is an exponential function.y'
y'
e^-l
In order to reduce the distance from the dot, do not update the parameters so quickly. An absolute value loss function can be used.
- When
y' > 0
when, the derivative is 1; - When
y' < 0
, the derivative is -1; - When
y' = 0
is not differentiable, the derivative is[-1,1]
between
Variation is a parameter, no matter how far apart the actual and predicted values are. The advantage is that the stability in the early stage is better. When the optimization reaches the end, when it y-y'
approaches 0, its gradient becomes very large, and it is not easy to optimize, that is, the green sharp place.
2.3 Huber’s Robust Loss
Huber's Robust Loss combines the benefits of L2 Loss and L1 Loss.
Blue line: the changing loss function, the predicted value of the change when y=0
, y'
which is a quadratic function y'
Green line: the likelihood function e^-l
, which is a Gaussian distribution, but also smoother at 0
Orange line: the gradient of the loss function , when the predicted value is far from the real value, the gradient is an absolute value error, which is a constant; when the predicted value is relatively close to the real value, it is a squared error.
3. Image Classification Dataset
refer to
https://www.bilibili.com/video/BV1K64y1Q7wu?p=1