该文章是对之前 Reid损失函数理论学习的补充。从代码方面进行Triplet loss(三元组损失函数)的学习。以及包含Tirplet hard是如何找最困难的正负样本。
目录
score, feat = model(img)
此时score的shape为[8,751],8指的是batch_size,751是分类,形式如下,输出均为hard形式(也就是还没有经过softmax):
tensor([[-1.3245e-02, -2.3512e-02, -1.8136e-02, ..., 3.5739e-05,
-2.3923e-02, 1.6440e-02],
[-1.4504e-02, -1.9184e-02, -2.0377e-02, ..., 4.1178e-03,
-2.3450e-02, 1.9983e-02],
[-1.0439e-02, -1.7005e-02, -1.7642e-02, ..., 2.7252e-03,
-2.0623e-02, 1.6570e-02],
...,
[-1.6970e-02, -1.6319e-02, -1.7090e-02, ..., 3.0741e-03,
-2.4214e-02, 1.7379e-02],
[-1.0949e-02, -2.1738e-02, -1.7383e-02, ..., 2.5572e-03,
-2.6021e-02, 1.7932e-02],
[-1.7376e-02, -2.4241e-02, -1.1886e-02, ..., 3.6473e-03,
-2.8584e-02, 2.1102e-02]], device='cuda:0', grad_fn=<MmBackward>)
此时的feat的shape为[8,512],8是batch size,512是model输出的维度,形式如下:
tensor([[1.0856, 1.1822, 1.0917, ..., 0.8223, 1.0282, 1.0947],
[1.0087, 0.8157, 0.9607, ..., 0.9637, 0.9081, 1.1780],
[0.8724, 0.9109, 0.8726, ..., 0.8570, 0.9679, 0.8008],
...,
[0.7935, 0.9719, 1.0124, ..., 0.7361, 1.1128, 0.8532],
[1.0150, 0.9681, 0.9360, ..., 1.0709, 0.9801, 1.0442],
[1.0555, 1.1638, 0.8265, ..., 1.2366, 1.1049, 1.0189]],
device='cuda:0', grad_fn=<ViewBackward>)
此时的target,shape为8,表示为ID:
tensor([119, 119, 119, 119, 714, 714, 714, 714], device='cuda:0')
TripletLoss
计算距离矩阵
这里计算欧氏距离。
x,y是特征向量,shape均为[batch_size,512]。
m,n均为batch大小,这里为8.
xx是:a²,pow计算每个tensor的平方,sum中的1表示在1维度上,也就是在512所处维度,所以torch.pow(x,2).sum(1,True)的shape为[8,1],表示为按行求和,expand(m,n)表示扩展为[8,8]维度。
torch.pow(x,2).sum(1,True):
tensor([[539.8054],
[457.5028],
[474.0969],
[557.4810],
[778.4174],
[474.9417],
[503.8409],
[625.3312]], device='cuda:0', grad_fn=<SumBackward1>)torch.pow(x, 2).sum(1, keepdim=True).expand(m, n):
tensor([
[539.8054, 539.8054, 539.8054, 539.8054, 539.8054, 539.8054, 539.8054,539.8054],
[457.5028, 457.5028, 457.5028, 457.5028, 457.5028, 457.5028, 457.5028,457.5028],
[474.0969, 474.0969, 474.0969, 474.0969, 474.0969, 474.0969, 474.0969,474.0969],
[557.4810, 557.4810, 557.4810, 557.4810, 557.4810, 557.4810, 557.4810,557.4810],
[778.4174, 778.4174, 778.4174, 778.4174, 778.4174, 778.4174, 778.4174,778.4174],
[474.9417, 474.9417, 474.9417, 474.9417, 474.9417, 474.9417, 474.9417,474.9417],
[503.8409, 503.8409, 503.8409, 503.8409, 503.8409, 503.8409, 503.8409,503.8409],
[625.3312, 625.3312, 625.3312, 625.3312, 625.3312, 625.3312, 625.3312,625.3312]], device='cuda:0', grad_fn=<ExpandBackward>)
同理,yy也是一样,只不过这里有个转置操作。
dist=xx+yy就可以得到a²+b²。
注意addmm_和addmm区别。
最后dist就是我们得到的矩阵距离。
def euclidean_dist(x, y):
"""
Args:
x: pytorch Variable, with shape [m, d]
y: pytorch Variable, with shape [n, d]
Returns:
dist: pytorch Variable, with shape [m, n]
"""
m, n = x.size(0), y.size(0)
xx = torch.pow(x, 2).sum(1, keepdim=True).expand(m, n) # a²
yy = torch.pow(y, 2).sum(1, keepdim=True).expand(n, m).t() # b^2
dist = xx + yy # a^2+b^2
dist.addmm_(1, -2, x, y.t()) # a^2+b^2 - 2ab
dist = dist.clamp(min=1e-12).sqrt() # for numerical stability 限定一下范围防止为最小出现0无法求导
return dist
此时得到的矩阵距离为,shape为[batch_size,batch_size]:
tensor([[1.0000e-06, 4.3200e+00, 4.1502e+00, 3.7251e+00, 7.3499e+00, 3.9080e+00,
3.6081e+00, 4.4757e+00],
[4.3200e+00, 7.8125e-03, 3.5319e+00, 4.8321e+00, 9.8512e+00, 3.2775e+00,
3.6503e+00, 6.6219e+00],
[4.1502e+00, 3.5319e+00, 1.0000e-06, 4.3095e+00, 9.1650e+00, 3.4574e+00,
3.3446e+00, 5.8928e+00],
[3.7251e+00, 4.8321e+00, 4.3095e+00, 1.0000e-06, 6.7865e+00, 4.4078e+00,
3.6200e+00, 4.1992e+00],
[7.3499e+00, 9.8512e+00, 9.1650e+00, 6.7865e+00, 1.0000e-06, 9.0147e+00,
8.0675e+00, 5.2237e+00],
[3.9080e+00, 3.2775e+00, 3.4574e+00, 4.4078e+00, 9.0147e+00, 1.0000e-06,
3.1999e+00, 5.9490e+00],
[3.6081e+00, 3.6503e+00, 3.3446e+00, 3.6200e+00, 8.0675e+00, 3.1999e+00,
1.0000e-06, 5.0834e+00],
[4.4757e+00, 6.6219e+00, 5.8928e+00, 4.1992e+00, 5.2237e+00, 5.9490e+00,
5.0834e+00, 1.1049e-02]], device='cuda:0', grad_fn=<SqrtBackward>)
寻找困难样本
dist_mat:距离矩阵。
labels:标签
return_inds:是否返回索引。
代码中的N为batch大小。
labels.expand(N,N),将labels的维度进行扩展(重复),大小为[8,8]。
labels.expand(N,N):
tensor([[119, 119, 119, 119, 714, 714, 714, 714],
[119, 119, 119, 119, 714, 714, 714, 714],
[119, 119, 119, 119, 714, 714, 714, 714],
[119, 119, 119, 119, 714, 714, 714, 714],
[119, 119, 119, 119, 714, 714, 714, 714],
[119, 119, 119, 119, 714, 714, 714, 714],
[119, 119, 119, 119, 714, 714, 714, 714],
[119, 119, 119, 119, 714, 714, 714, 714]], device='cuda:0')
labels.expand(N,N).t()为转置操作。
tensor([[119, 119, 119, 119, 119, 119, 119, 119],
[119, 119, 119, 119, 119, 119, 119, 119],
[119, 119, 119, 119, 119, 119, 119, 119],
[119, 119, 119, 119, 119, 119, 119, 119],
[714, 714, 714, 714, 714, 714, 714, 714],
[714, 714, 714, 714, 714, 714, 714, 714],
[714, 714, 714, 714, 714, 714, 714, 714],
[714, 714, 714, 714, 714, 714, 714, 714]], device='cuda:0')
labels.expand(N, N).eq(labels.expand(N, N).t())。eq表示逐元素比较,判断对应位置上是否相等。为True表示为相同的ID(也就是相同的人),也就是正样本。表示为False表示为不同的人,负样本。
tensor([[ True, True, True, True, False, False, False, False],
[ True, True, True, True, False, False, False, False],
[ True, True, True, True, False, False, False, False],
[ True, True, True, True, False, False, False, False],
[False, False, False, False, True, True, True, True],
[False, False, False, False, True, True, True, True],
[False, False, False, False, True, True, True, True],
[False, False, False, False, True, True, True, True]],
device='cuda:0')
tensor.ne()表示逐元素比较不相等的。与上面相反,相同ID为False,不同ID为True。
tensor([[False, False, False, False, True, True, True, True],
[False, False, False, False, True, True, True, True],
[False, False, False, False, True, True, True, True],
[False, False, False, False, True, True, True, True],
[ True, True, True, True, False, False, False, False],
[ True, True, True, True, False, False, False, False],
[ True, True, True, True, False, False, False, False],
[ True, True, True, True, False, False, False, False]],
device='cuda:0')
dist_ap, relative_p_inds = torch.max(
dist_mat[is_pos].contiguous().view(N, -1), 1, keepdim=True)
dist_mat是我们前面得到的距离矩阵,用is_pos进行筛选就可得到正样本的距离,并找到最大距离就是正样本距离,同时还可以得到索引。最大距离为最困难的正样本
dist_ap:
tensor([[4.3200],
[4.8321],
[4.3095],
[4.8321],
[9.0147],
[9.0147],
[8.0675],
[5.9490]], device='cuda:0', grad_fn=<MaxBackward0>)
同理得到最小距离就是最困的的负样本。
dist_an, relative_n_inds = torch.min(
dist_mat[is_neg].contiguous().view(N, -1), 1, keepdim=True)
dist_an:
tensor([[3.6081],
[3.2775],
[3.3446],
[3.6200],
[6.7865],
[3.2775],
[3.3446],
[4.1992]], device='cuda:0', grad_fn=<MinBackward0>)
y = dist_an.new().resize_as_(dist_an).fill_(1)
dist_an.new()是创建一个type和device和dist_an一样的tensor[但此时没有任何内容]。
reisze_as_将tensor的shape与dist_an一样,fill表示用1进行填充。
计算loss.传入三个参数,dist_an为困难负样本距离矩阵。dist_ap为困难正样本的距离矩阵。这里的用于计算tripeltloss用的是nn.MarginRankingLoss
loss = self.ranking_loss(dist_an, dist_ap, y)
loss=tensor(2.6602, device='cuda:0', grad_fn=<MeanBackward0>)
def hard_example_mining(dist_mat, labels, return_inds=False):
"""For each anchor, find the hardest positive and negative sample.
Args:
dist_mat: pytorch Variable, pair wise distance between samples, shape [N, N]
labels: pytorch LongTensor, with shape [N]
return_inds: whether to return the indices. Save time if `False`(?)
Returns:
dist_ap: pytorch Variable, distance(anchor, positive); shape [N]
dist_an: pytorch Variable, distance(anchor, negative); shape [N]
p_inds: pytorch LongTensor, with shape [N];
indices of selected hard positive samples; 0 <= p_inds[i] <= N - 1
n_inds: pytorch LongTensor, with shape [N];
indices of selected hard negative samples; 0 <= n_inds[i] <= N - 1
NOTE: Only consider the case in which all labels have same num of samples,
thus we can cope with all anchors in parallel.
"""
assert len(dist_mat.size()) == 2
assert dist_mat.size(0) == dist_mat.size(1)
N = dist_mat.size(0)
# shape [N, N]
is_pos = labels.expand(N, N).eq(labels.expand(N, N).t())
is_neg = labels.expand(N, N).ne(labels.expand(N, N).t())
# `dist_ap` means distance(anchor, positive)
# both `dist_ap` and `relative_p_inds` with shape [N, 1]
dist_ap, relative_p_inds = torch.max(
dist_mat[is_pos].contiguous().view(N, -1), 1, keepdim=True)
# `dist_an` means distance(anchor, negative)
# both `dist_an` and `relative_n_inds` with shape [N, 1]
dist_an, relative_n_inds = torch.min(
dist_mat[is_neg].contiguous().view(N, -1), 1, keepdim=True)
# shape [N]
dist_ap = dist_ap.squeeze(1)
dist_an = dist_an.squeeze(1)
if return_inds:
# shape [N, N]
ind = (labels.new().resize_as_(labels)
.copy_(torch.arange(0, N).long())
.unsqueeze(0).expand(N, N))
# shape [N, 1]
p_inds = torch.gather(
ind[is_pos].contiguous().view(N, -1), 1, relative_p_inds.data)
n_inds = torch.gather(
ind[is_neg].contiguous().view(N, -1), 1, relative_n_inds.data)
# shape [N]
p_inds = p_inds.squeeze(1)
n_inds = n_inds.squeeze(1)
return dist_ap, dist_an, p_inds, n_inds
return dist_ap, dist_an
完整代码:
class TripletLoss(object):
"""Modified from Tong Xiao's open-reid (https://github.com/Cysu/open-reid).
Related Triplet Loss theory can be found in paper 'In Defense of the Triplet
Loss for Person Re-Identification'."""
# Triploss(Dap-Dan+α) margin就是α,ranking_loss就是计算(Dap-Dan+α)
def __init__(self, margin=None):
self.margin = margin
if margin is not None:
# 排序损失函数,D(x1,x2,y),x1,x2是给定的待排序的两个输入,y代表真实标签∈[-1,1].当y=1,x1排在x2之前,y=-1,x1排在x2之后。
# max(0,-y * (x1-x2)+margin)
# x1, x 2 x2x2排序正确且− y ∗ ( x 1 − x 2 ) > margin, 则loss为0
self.ranking_loss = nn.MarginRankingLoss(margin=margin)
# if margin为None的时候,采用softMarginLoss
else:
self.ranking_loss = nn.SoftMarginLoss()
def __call__(self, global_feat, labels, normalize_feature=False):
if normalize_feature:
global_feat = normalize(global_feat, axis=-1)
# 计算距离矩阵,欧式距离
dist_mat = euclidean_dist(global_feat, global_feat)
# 得到hard example,triHard loss.找最困难的正样本和最困难的负样本。
dist_ap, dist_an = hard_example_mining(
dist_mat, labels)
y = dist_an.new().resize_as_(dist_an).fill_(1)
if self.margin is not None:
# dist_an:anchor和negative距离
# dist_ap:anchor和positive距离
# loss(x,y) = max(0,-y*(x1-x2)+margin)
# MarginRankingLoss在计算triplet loss的时候与TripletMarginLoss不同的是传入的三个形参不再是原始的向量anchor,positive,negative,而是计算好的dap,dan
loss = self.ranking_loss(dist_an, dist_ap, y)
else:
loss = self.ranking_loss(dist_an - dist_ap, y)
return loss, dist_ap, dist_an