smooth l1 loss & l1 loss & l2 loss

Quoted from: https: //www.zhihu.com/question/58200555/answer/621174180

In order to limit the gradient in two ways:

  1. When the prediction frame and the large difference in ground truth, gradient values ​​will not too large;
  2. When the difference between the predicted frame and the ground truth is small, the gradient value is small enough.

Investigation are several loss function, which [official]is elementwise between the predicted frame and groud truth differences:

 

 

Observation (4) , when x is increased L2 loss x derivative also increases. This leads to the initial training, the predicted value and groud truth when the difference is too large, the loss of function of the gradient of the predicted value is very large, unstable training.

According to Equation (. 5) , Ll of x the derivative is constant. This leads to the late training, prediction and ground truth the difference is very small, L1 absolute value of the loss of the predicted value of the derivative is still 1 , and the learning rate , if unchanged, the loss function will fluctuate around a stable value, it is difficult to continue to achieve the convergence higher accuracy.

Last observation (. 6) , Smooth Ll at x is small, for x gradient also becomes smaller, while x is large, for x absolute value of the gradient reaches the upper limit of 1 , it will not be too big to destroy network parameters . Smooth L1 perfectly avoid the L1 and L2 defect loss. As a function of the image:

 

 In summary smootn l1 loss combines the advantages of l1 loss and l2 loss.

 

Guess you like

Origin www.cnblogs.com/lyp1010/p/11881174.html