Pytorch loss function函数总结

注意: 很多的loss 函数都有size_average和reduce俩个布尔类型的参数。因为一般损失函数都是直接计算batch的数据，因此返回的loss结果都是维度(batch_size, )的向量。

如果 reduce = False，那么 size_average 参数失效，直接返回向量形式的 loss；
如果 reduce = True，那么 loss 返回的是标量

2.1 如果 size_average = True，返回 loss.mean();
2.2 如果 size_average = True，返回 loss.sum();

注意，默认情况下，reduce=True, size_average=True
在下面情况下，一般都把这俩个参数设置为False, 这样比较好理解原始的损失函数定义

toch.nn.L1Loss

$loss(x_{i},y_{i})=|x_{i}-y_{i}|$
x和y可以是向量，可以是矩阵，但是其维度需要一致，同时loss的维度和x、y是一致的，其中下标i表示第几个元素。

loss_fn = torch.nn.L1Loss(reduce=False, size_average=False)
 input = torch.autograd.Variable(torch.randn(3,4)) 
 target = torch.autograd.Variable(torch.randn(3,4)) 
 loss = loss_fn(input, target) 
 print(input); print(target);  print(loss) 
 print(input.size(), target.size(), loss.size())

toch.nn.SmoothL1Loss

也叫作Huber Loss,误差在(-1,1)上是平方损失，其他情况是L1损失。
$loss(x_{i},y_{i})=\left\{\begin{matrix} 0.5*(x_{i}-y_{i})^{2}, \ |x_{i}-y_{j}|<1\\ |x_{i}-y_{i}|-0.5, \ otherwise \end{matrix}\right.$
此loss对于异常点的敏感性不如MSELoss, 但是在某些情况下防止了梯度爆炸(参考 Fast R-CNN)。跟L1loss类似，都是element-wise的操作，下标i代表第i个元素。

loss_fn = torch.nn.SmoothL1Loss(reduce=False, size_average=False) 
input = torch.autograd.Variable(torch.randn(3,4)) 
target = torch.autograd.Variable(torch.randn(3,4)) 
loss = loss_fn(input, target) 
print(input); print(target); print(loss) 
print(input.size(), target.size(), loss.size())

toch.nn.MSELoss

均方损失函数，类似于nn.L1Loss函数：
$loss(x_{i},y_{i})=(x_{i}-y_{i})^{2}$

loss_fn = torch.nn.MSELoss(reduce=False, size_average=False) 
input = torch.autograd.Variable(torch.randn(3,4)) 
target = torch.autograd.Variable(torch.randn(3,4)) 
loss = loss_fn(input, target) 
print(input); print(target); print(loss) 
print(input.size(), target.size(), loss.size())

toch.nn.BCELoss

二分类用的交叉熵，用的时候需要在该层前面加上Sigmoid函数。
因为离散版的交叉熵定义是 $H(p,q)=- \sum_{i}p_{i}logq_{i}$ , 其中p,q都是向量，且都是概率分布。如果是二分类的话，因为只有正例和反例，且俩者概率和为1，那么只需要预测一个概率就好了，因此可以简化成：
$loss(x_{i},y_{i})=-w_{i}[y_{i}logx_{i}+(1-y_{i})log(1-x_{i})]$
其中这里的x, y 可以是向量也可以是矩阵，i代表下标， $x_{i}$ 表示第 i 个样本预测为正例的概率， $y_{i}$ 表示第 i 个样本的标签， $- w_{i}$ 表示该项的权重大小，loss, x, y, w 的维度都是一样的。

import torch.nn.functional as F 
from torch.autograd import Variable
loss_fn = torch.nn.BCELoss(reduce=False, size_average=False)
 input = Variable(torch.randn(3, 4)) 
 target = Variable(torch.FloatTensor(3, 4).random_(2))
  loss = loss_fn(F.sigmoid(input), target) 
  print(input); print(target); print(loss)

其中权重的维度和x,y一样，有时候遇到正负样本不均衡的时候，可能要多写一句话

class_weight = Variable(torch.FloatTensor([1, 10])) # 这里正例比较少，因此权重要大一些 
target = Variable(torch.FloatTensor(3, 4).random_(2)) 
weight = class_weight[target.long()] # (3, 4) 
loss_fn = torch.nn.BCELoss(weight=weight, reduce=False, size_average=False)

toch.nn.CrossEntropyLoss

多分类用的交叉熵损失函数，用这个loss前面不需要加softmax层。值得注意的是，该函数限制了target的类型为torch.LongTensor，而且不是多标签就意味着是one-hot的形式，即只有一个位置是1,其他位置都是0,　那么代入交叉熵公式中化简后就成了下面的简化形式。
$loss(x,label)=-w_{lable}*log\frac{e^{x_{lable}}}{\sum_{i}^{N} e^{x_{i}}}=-w_{lable}*[-x_{label} + log \sum_{i}^{N} e^{x_{i}}]$
这里的 $x∈ℝ^{N}$ ，是没有经过 Softmax 的激活值，N 是 x 的维度大小（或者叫特征维度）； $label∈[0,C−1]$ 是标量，是对应的标签，可以看到两者维度是不一样的。C 是要分类的个数。 $w∈ℝ^{C}$ 是维度为 C 的向量，表示标签的权重，样本少的类别，可以考虑把权重设置大一点。

import torch
from torch.autograd import Variable
weight = torch.Tensor([1,2,1,1,10])
loss_fn = torch.nn.CrossEntropyLoss(reduce=False, size_average=False, weight=weight)
input = Variable(torch.randn(3, 5)) # (batch_size, C) 
target = Variable(torch.LongTensor(3).random_(5))  #这里应该为LongTensor
loss = loss_fn(input, target) 
print(input); print(target); print(loss)

toch.nn.KLDivLoss

计算KL散度数，KL散度常用来描述俩个分布的距离，并在输出分布的空间上执行直接回归。
KL散度，又叫做相对熵，算的是俩个分布之间的距离，越相似则越接近零。给定的输入应该是log-probabilities, target和input的输入应该一致。
$loss(x,y)=1/N * \sum_{i}^{N}[y_{i}*(logy_{i}-x_{i})]$
注意这里的 $x_{i}$ 是 log 概率，其中， $x_{i}$ 代表输入， $y_{i}$ 代表target。

torch.nn.CosineEmbeddingLoss

余弦相似度的损失，目的是让两个向量尽量相近。注意这两个向量都是有梯度的。
$loss(x.y)=\left\{\begin{matrix} 1-cos(x,y), \ if \ y==1\\ max(0,cos(x,y)+margin), \ if \ y==-1 \end{matrix}\right.$
margin 可以取 [−1,1]，但是比较建议取 0-0.5 较好。

此外还有好多LOSS FUNCTION… …

具体见：
https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-nn/#loss-functions
https://blog.csdn.net/zhangxb35/article/details/72464152?utm_source=blogxgwz0