在这里插入图片描述

CVPR-2019

文章目录

1 Background and Motivation
2 Advantages / Contributions
3 Standard Baseline
4 Method
5 Experiments
5 Conclusion（own）

1 Background and Motivation

ReID 任务很多相关工作都是在一个相对较低的 baseline 上开展的，且许多 improvements were mainly from training tricks rather than methods themselves

本文作者 collect and evaluate 一些 effective training tricks in person ReID 任务，提出一个 SOTA 的较为规范的 baseline

2 Advantages / Contributions

仅用 global feature（而不是 concatenate multi-branch features）

实现 94.5% rank-1 and 85.9% mAP on Market1501

3 Standard Baseline

在这里插入图片描述

一个 batch 又 P 个人，每个人 K 张图片，经过 backbone 提取出 ReID features（比如 1024 维），然后接个 FC 计算出 ID prediction logits 来判断图片中的人是谁

Triplet loss 让同一个人的特征靠近，不同人的特征拉远

ID loss 让网络学会预测图片中的人是谁

4 Method

在 standard baseline 基础上，加入了 6 个 tricks

在这里插入图片描述

4.1 Warmup Learning Rate

在这里插入图片描述

花 10 个 epoch 慢热，然后慢慢减小学习率

4.2 Random Erasing Augmentation

在这里插入图片描述

0.3<spatial ratio<3.33

0.02<面积占比<0.4

4.3 Label Smoothing

在这里插入图片描述
$\varepsilon$ 为 0.1

具体理论参考【Inception-v3】《Rethinking the Inception Architecture for Computer Vision》

4.4 Last Stride

backbone 的最后一个 stage 的 stride 变为 1，这样保证了特征图的分辨率

4.5 BNNeck

在这里插入图片描述

ID loss 优化的是 cosine distance（找超平面，图 6（a）中的黄色虚线）

triplet loss 优化的是 euclidean distance（图 6 （b），类内紧凑，类间距离拉大）

如果联合二者一起优化，a possible phenomenon is that one loss is reduced，while the other loss is oscillating or even increased

作者的解决方法是通过改变下 ID loss 中 logits 的分布！达到利于优化的目的

在这里插入图片描述

BNNeck 结构中 FC 层去掉了 bias，这样能保证 ID loss 的 hyper-planes 经过 coordinate axis

道理同 y = kx 能过原点， y = kx+b (b≠0) 不过原点

4.6 Center Loss

在这里插入图片描述
triplet loss 中 $d_p$ and $d_n$ are feature distances of positive pair and negative pair. $\alpha$ is the margin of triplet loss, $x]_+$ 等价于 $m a x (0, x)$ ，更多细节可以参考 Triplet-Loss原理及其实现、应用

上面 loss 的形式有个缺点， $d_p$ 、 $d_n$ 为 0.3 与 0.1 时和为 1.3 与 1.1 时 loss 是一样的

Triplet loss is determined by two person IDs sampled randomly. It is difficult to ensure that $d_p$ < $d_n$ in the whole training dataset.

作者引入了 center loss 来 make up triplet loss 的缺点，形式如下

在这里插入图片描述
其中 $c_{y_j}$ denotes the $y_i$ th class center of deep features，B 是 batch-size，让同一个人的尽量聚在一起

改进后的整体 loss 如下

在这里插入图片描述
$L_{ID}$ 为交叉熵 loss

5 Experiments

4.1 Datasets

Market1501 和 DukeMTMC

4.2 Influences of Each Trick (Same domain)

在这里插入图片描述
6 个都有涨点

4.3 Influences of Each Trick (Cross domain)

在这里插入图片描述

REA 不行哈，作者的解释为

We infer that REA masking the regions of training images lets the model learn more knowledge in the training domain.

4.4 Comparison of State-of-the-Arts

在这里插入图片描述

挺猛的

4.5 Analysis of BNNeck

在这里插入图片描述
ID loss 用 cosine distance 优化比较好

4.6 Influences of the Number of Batch Size

在这里插入图片描述
影响不大

We infer that large K helps to mine hard positive pairs while large P helps to mining hard negative pairs.

4.7 Influences of Image Size

在这里插入图片描述
结论是影响不大

5 Conclusion（own）

BNNeck 的把特征映射更标准化，这样划分超平面时更容易（ID loss 和 triplet loss）
Random Erasing Augmentation 还蛮过瘾的
triplet loss 只追求绝对差值而忽略了原始积累，提出 center + triplet loss 来进一步使得同一类特征聚集得更紧密

【SB-ReID】《Bag of Tricks and A Strong Baseline for Deep Person Re-identification》

文章目录

1 Background and Motivation

2 Advantages / Contributions

3 Standard Baseline

4 Method

4.1 Warmup Learning Rate

4.2 Random Erasing Augmentation

4.3 Label Smoothing

4.4 Last Stride

4.5 BNNeck

4.6 Center Loss

5 Experiments

4.1 Datasets

4.2 Influences of Each Trick (Same domain)

4.3 Influences of Each Trick (Cross domain)

4.4 Comparison of State-of-the-Arts

4.5 Analysis of BNNeck

4.6 Influences of the Number of Batch Size

4.7 Influences of Image Size

5 Conclusion（own）

猜你喜欢