Eighteen loss functions of PyTorch

Reprinted from: https://blog.csdn.net/u011995719/article/details/85107524

Please indicate the source

This article is intercepted from "PyTorch Model Training Practical Tutorial", for the full text pdf, please click: https://github.com/tensor-yu/PyTorch_Tutorial


What we call optimization is to optimize the network weights to make the loss function value smaller. However, does the decrease of the loss function mean that the classification/regression accuracy of the model becomes higher? So how to choose a variety of loss functions? Please come to understand the seventeen loss functions given in PyTorch.

Please run the supporting code. There are detailed explanations and manual calculations in the code, which are helpful to understand the principle of loss function.
Supporting code for this section: /Code/3_optimizer/3_1_lossFunction

1.L1loss

class torch.nn.L1Loss(size_average=None, reduce=None)
There is still reduction='elementwise_mean' parameter in the official document, but this parameter has been deleted in the code implementation.
Function:
Calculate the absolute value of the difference between output and target, optional return A tensor of the same dimension or a scalar.
Calculation formula:
Insert picture description here
Parameters:
reduce (bool)
-whether the return value is a scalar, the default is True size_average (bool)-effective when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
Example:
/Code/3_optimizer/3_1_lossFunction/1_L1Loss.py

2.MSELoss

class torch.nn.MSELoss(size_average=None, reduce=None, reduction='elementwise_mean')
There is still reduction='elementwise_mean' parameter in the official document, but this parameter has been deleted in the code implementation.
Function:
calculate the difference between output and target Square, optionally returns a tensor of the same dimension or a scalar.
Calculation formula:
Insert picture description here
Parameters:
reduce (bool)
-whether the return value is a scalar, the default is True size_average (bool)-effective when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
Example:
/Code/3_optimizer/3_1_lossFunction/2_MSELoss.py

3.CrossEntropyLoss

class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='elementwise_mean')
function:
After the input is passed through the softmax activation function, the cross entropy loss with the target is calculated. That is, this method combines nn.LogSoftmax() and nn.NLLLoss(). The cross entropy loss function in the strict sense should be nn.NLLLoss().

Supplement: Let’s talk about the cross-entropy loss function.
Cross-entropy Loss is also known as Log-likelihood Loss and Log-likelihood Loss. It can also be called Logistic regression loss in the second classification. Logistic Loss). The expression of the cross entropy loss function is L =-sigama(y_i * log(x_i)). Pytroch is not a cross-entropy loss function in the strict sense here, but first passes the input through the softmax activation function, "normalizes" the vector into a probability form, and then calculates the cross-entropy loss in the strict sense with the target.
In multi-classification tasks, softmax activation function + cross entropy loss function is often used, because cross entropy describes the difference between two probability distributions, but the output of the neural network is a vector, not the form of a probability distribution. Therefore, the softmax activation function is required to "normalize" a vector into the form of a probability distribution, and then use the cross-entropy loss function to calculate loss.
Looking back at PyTorch's CrossEntropyLoss(), when mentioned in the official document, nn.LogSoftmax() and nn.NLLLoss() are combined, nn.LogSoftmax() is equivalent to the activation function, and nn.NLLLoss() is the loss function. In combination, can it be called softmax + cross entropy loss function?

Calculation formula:
Insert picture description here

Parameters:
weight(Tensor)-Set the weight for the loss of each category, which is often used for category imbalance problems. The weight must be a float type tensor, and its length must be consistent with category C, that is, weight must be set for each category. Calculation formula with weight:

size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)- whether the return value is a scalar, the default is True
ignore_index(int)- ignore a category, do not calculate its loss, its loss will be 0, and, when using size_average, that type of loss will not be calculated , The denominator when dividing will not count that type of sample.
Example:
/Code/3_optimizer/3_1_lossFunction/3_CroosEntropyLoss.py
Supplement:
output can be not only a vector, but also a picture, that is, the image is classified into pixels. This example can be seen in NLLLoss(), which is in image segmentation Very useful.

4.NLLLoss

class torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='elementwise_mean')
Function: It is
not easy to describe its function in words! Please see the calculation formula: loss(input, class) = -input[class]. For example, for a three-classification task, input=[-1.233, 2.657, 0.534], the true label is 2 (class=2), and the loss is -0.534. It is the output on the corresponding category, take a minus sign! Feeling deceived by the name of NLLLoss.
Practical application: It is
often used for multi-classification tasks, but before input is input to NLLLoss(), the input needs to be activated by log_softmax function, that is, the input is converted into a probability distribution and the logarithm is taken. In fact, these steps are in CrossEntropyLoss. If you don't want the last layer of the network to be the log_softmax layer, you can use CrossEntropyLoss to completely replace this function.
Parameters:
weight(Tensor)-Set the weight for the loss of each category, which is often used for category imbalance problems. The weight must be a float type tensor, and its length must be consistent with category C, that is, weight must be set for each category.
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value divided by the weight sum; when it is False, the returned loss is the sum of each sample.
reduce(bool)-Whether the return value is a scalar, the default is True.
ignore_index(int)- Ignore a certain category, do not calculate its loss, its loss will be 0, and when using size_average, the loss of that type will not be calculated, and the denominator of that type will not be counted when dividing. sample.
Example:
/Code/3_optimizer/3_1_lossFunction/4_NLLLoss.py Pay
special attention:
when you bring weights, reduce = True, size_average = True, the calculation formula is:
Insert picture description here
For example, when input is [[0.6, 0.2, 0.2], [0.4, 1.2, 0.4]], target= [0, 1], weight = [0.6, 0.2, 0.2]
l1 =-0.6 0.6

=-0.36 l2 =-1.2 0.2 =-0.24 loss = -0.36/(0.6+0.2) + -0.24/(0.6+0.2) = -0.75

5.FishNLLLoss

class torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='elementwise_mean')
function:
used for classification tasks where the target obeys the Poisson distribution.
Calculation formula:
Insert picture description here
Parameters:
log_input(bool)- when True, the calculation formula is: loss(input,target)=exp(input)-target * input;
when False, loss(input,target)=input-target * log (input+eps)
full(bool)- Whether to calculate all loss. For example, when the Stirling formula is used to approximate the factorial term, this is target*log(target)-target+0.5∗log(2πtarget)
eps(float)- when log_input = False, it is used to prevent the calculation of log(0), And an added correction item. That is, loss(input, target)=input-target * log(input+eps)
size_average(bool)- is valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)- Whether the return value is a scalar, the default is True
instance:
/Code/3_optimizer/3_1_lossFunction/5_PoissonNLLLoss.py

6.KLDivLoss

class torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='elementwise_mean')
Function:
Calculate the KL divergence (Kullback–Leibler divergence) between input and target.
Calculation formula:
Insert picture description here
(there is a code to manually calculate it later, which proves that the calculation formula is indeed this, but why is it not calculating the logarithm of x_n?)

Supplement: KL divergence
(Kullback–Leibler divergence) is also called Relative Entropy, which is used to describe the difference between two probability distributions. Calculation formula (when discrete):

Where p represents the true distribution, q represents the fitting distribution of p, and D(P||Q) represents the information loss generated when the probability distribution q is used to fit the real distribution p. The information loss here can be understood as loss. The lower the loss, the closer the fitted distribution q is to the true distribution p. At the same time, this formula can be viewed from another angle, that is, the expected value of the logarithmic difference between p and q on p is calculated.
Pay special attention to D(p||q) ≠ D(q||p), which has no symmetry, so it cannot be called KL distance.
Information Entropy = Cross Entropy-Relative Entropy
Observing the three from the perspective of information theory, the relationship is Information Entropy = Cross Entropy-Relative Entropy. In machine learning, when the training data is fixed, minimizing the relative entropy D(p||q) is equivalent to minimizing the cross entropy H(p,q).

Parameters:
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value, and the average value is element-wise, rather than the average of the samples; when it is False, the return is the sum of the loss of each dimension of each sample.
reduce(bool)-Whether the return value is a scalar, the default is True.

Precautions for use:
To obtain the true KL divergence, the following operations are required:

  1. reduce = True ;size_average=False
  2. The calculated loss needs to be averaged over the batch

Example:
/Code/3_optimizer/3_1_lossFunction/6_KLDivLoss.py

7.BCELoss

class torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='elementwise_mean')
function:
the cross-entropy calculation function for two classification tasks. This function can be considered as a special case of nn.CrossEntropyLoss function. The classification is limited to two classifications, and y must be {0,1}. It should also be noted that input should be in the form of a probability distribution, so as to meet the application of cross entropy. So before BCELoss, the input is generally the output of the sigmoid activation layer, and the official example is also the same. This loss function is commonly used in autoencoders.
Calculation formula:
Insert picture description here
Parameters:
weight(Tensor)-Set the weight for the loss of each category, which is often used for category imbalance problems.
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-whether the return value is a scalar, the default is True

8.BCEWithLogitsLoss

class torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='elementwise_mean', pos_weight=None)
Function:
Combine Sigmoid with BCELoss, similar to CrossEntropyLoss (will nn.LogSoftmax() and nn. NLLLoss() to combine). That is, the input will go through the Sigmoid activation function, turning the input into the form of a probability distribution.
Calculation formula:
Insert picture description here
σ() represents the Sigmoid function.
In particular, when setting weight:

Parameters:
weight(Tensor)-: set the weight for a single sample in the batch, if given, has to be a Tensor of size “nbatch”.
pos_weight-: the weight of the positive sample, when p>1, increase the recall rate, when P <1, improve accuracy. It can achieve the effect of weighing recall and precision. Must be a vector with length equal to the number of classes.
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-whether the return value is a scalar, the default is True

9.MarginRankingLoss

class torch.nn.MarginRankingLoss(margin=0, size_average=None, reduce=None, reduction='elementwise_mean')
function:
calculate the similarity between two vectors, when the distance between the two vectors is greater than the margin, then loss Is positive, less than margin, loss is 0.
Calculation formula: When
Insert picture description here
y == 1, x1 is larger than x2, so there will be no loss. On the contrary, when y == -1, x1 is smaller than x2, so there will be no loss.
Parameters:
margin(float)-the difference between x1 and x2.
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.

10.HingeEmbeddingLoss

class torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='elementwise_mean')
function:
unknown. In order to expand the folding loss, it is mainly used to measure whether two inputs are similar. used for learning nonlinear embeddings or semi-supervised.
Calculation formula:
Insert picture description here
Parameters:
margin(float)- The default value is 1, the tolerance gap.
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.

11.MultiLabelMarginLoss

class torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='elementwise_mean')
function:
used for classification tasks when a sample belongs to multiple categories. For example, for a four-category task, sample x belongs to category 0, category 1, but does not belong to category 2, category 3.
Calculation formula:
Insert picture description here
x[y[j]] represents the output value of the class to which the sample x belongs, and x[i] represents the output value of the class that is not equal to.

Parameters:
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.
Input: © or (N,C) where N is the batch size and C is the number of classes.
Target: © or (N,C), same shape as the input.

12.SmoothL1Loss

class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='elementwise_mean')
Function:
Calculate smooth L1 loss, which belongs to one of Huber Loss (because the parameter δ is fixed to 1).
Supplement:
Huber Loss is often used in regression problems. Its biggest feature is that it is insensitive to outliers and noise and has strong robustness.
The formula is:
Insert picture description here
understand that when the absolute value of the error is less than δ, the L2 loss is used; if it is greater than δ, the L1 loss is used.
Back to SmoothL1Loss, this is Huber Loss when δ=1.
The calculation formula is:
Insert picture description here
Corresponding to the red line in the figure below:
Insert picture description here
Parameters:
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.

13.SoftMarginLoss

class torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='elementwise_mean')
Function:
Creates a criterion that optimizes a two-class classification logistic loss between input tensor xand target tensor y (containing 1 or -1). (I don't understand how to use it for the time being, friends who know it are welcome to add!)
Calculation formula:
Insert picture description here
Parameter:
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.

14.MultiLabelSoftMarginLoss

class torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction='elementwise_mean')
function:
SoftMarginLoss multi-label version, a multi-label one-versus-all loss based on max-entropy,
calculation formula :
Insert picture description here
Parameters:
weight(Tensor)- Set the weight for the loss of each category. The weight must be a float type tensor, and its length must be consistent with category C, that is, weight must be set for each category.

15.CosineEmbeddingLoss

class torch.nn.CosineEmbeddingLoss(margin=0, size_average=None, reduce=None, reduction='elementwise_mean')
Function:
Use Cosine function to measure whether two inputs are similar. used for learning nonlinear embeddings or semi-supervised.
Calculation formula:
Insert picture description here
Parameter:
margin(float)-: value range [-1,1], recommended setting range [0, 0.5]
size_average(bool)- effective when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.

16.MultiMarginLoss

class torch.nn.MultiMarginLoss(p=1, margin=1, weight=None, size_average=None, reduce=None, reduction='elementwise_mean')
Function:
Calculate the folding loss of multi-category.
Calculation formula:
Insert picture description here
Among them, 0≤y≤x.size(1); i == 0 to x.size(0) and i≠y; p==1 or p ==2; w[y] is each category weight.
Parameters:
p(int)- The default value is 1, only 1 or 2 can be selected.
margin(float)- The default value is 1
weight(Tensor)- set the weight for the loss of each category. The weight must be a float type tensor, and its length must be consistent with category C, that is, weight must be set for each category.
size_average(bool)-valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.

17.TripletMarginLoss

class torch.nn.TripletMarginLoss(margin=1.0, p=2, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='elementwise_mean')
Function:
calculate triplet loss, face verification Commonly used.
As shown in the figure below Anchor, Negative, Positive, the goal is to make the distance between the Positive element and the Anchor element as small as possible, and the distance between the Positive element and the Negative element as large as possible.
Insert picture description here
From the formula, the distance between the Anchor element and the Positive element plus a threshold is smaller than the distance between the Anchor element and the Negative element.
Insert picture description here
Calculation formula:
Insert picture description here

Parameters:
margin(float)- the default value is 1
p(int)- The norm degree, the default value is 2
swap(float)- The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. Default: False
size_average(bool)- valid when reduce=True. When it is True, the returned loss is the average value; when it is False, the sum of the loss of each sample returned.
reduce(bool)-Whether the return value is a scalar, the default is True.

18.nn.CTCLoss

nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)
function: Connectionist Temporal Classification. Mainly to solve the classification problem of time series data, especially the problem of label and output misalignment (Alignment problem)

**参考文献:**Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Network


Please indicate the source for reprinting: https://blog.csdn.net/u011995719/article/details/85107524

                                </div>

Guess you like

Origin blog.csdn.net/jokerxsy/article/details/106461692