1 Background and Motivation

随着科技的发展，face verification（他是张三吗？ 区别于他是谁？ 的 face recognition）技术被越来越多的应用到 mobile and embedded applications 中，例如 device unlock，application login，mobile payment 等

从服务器、PC端迁移应用到 mobile 和 embedded 中，face recognition 模型当然是越不吃 computation resource 越好，作者从这个角度出发，在现有通用（common visual recognition）轻量级网络的基础上（MobileNetV1，MobileNetV2，Shufflenet），提出了一个专为 face verification 设计的轻量级网络——MobileFaceNets

2 Related Work

1）通用型 light-weight 网络

缺点：not so accurate for face verification when train from scratch

2）专为 face verification 设计的 light-weight 网络

3）用 knowledge distillation 的压缩方法得到 light-weight 网络

3 Advantages / Contributions

专为 face verification 任务设计提出了轻量级网络 MobileFaceNets，提出了 global depthwise convolution layer 来替换 global average pooling layer，显著提升了 efficiency
在 LFW，AgeDB 小数据集上精度在线，效率提升明显，在 MegaFace 大数据集上精度。。。，效率。。。

4 Method

4.1 The Weakness of Common Mobile Networks for Face Verification

在这里插入图片描述

A typical deep face verification pipeline includes preprocessing face images, extracting face features by a trained deep model, and matching two faces by their features’ similarity or distance

先用别人的人脸检测网络（有 5 个关键点），然后根据关键点 align 到 112×112 大小！接作者的 MobileFaceNets，输入为 128 的 vector

虽然，在特征图上的每个位置，理论感受野大小是一样的，但由于位置不同，有效感受野是不一样

比如 RF1 和 RF2，一个感受野中心在特征图中心位置，一个感受野中心在特征图角落部分， RF2 的有效感受野要比 RF1 的有效感受野大，包含的信息更多

上述理论支持来自《Understanding the Effective Receptive Field in Deep Convolutional Neural Networks》，观点是

pixels at the center of receptive field have a much larger impact on an output and the distribution of impact within a receptive field on the output is nearly Gaussian

如果用 Global average pooling 来处理最后一层特征图，那么就相当于 equal import！等价于每个位置的特征重要性一样！在处理人脸识别任务中，略显不适

用 FC 层来代替 Global average pooling，可以避免 equal import 的问题！但是人脸验证任务中，最后一层特征图通道数一般为 1280，输出特征长度为 128，FC 层参数量为 7×7×1280*128，一言不合就 8 million（8,028,160）了！这在 light-weight 界是不能接受的

4.2 Global Depthwise Convolution

To treat different units（空间位置） of FMap-end（最后一层特征图） with different importance, we replace the global average pooling layer with a global depthwise convolution layer——GDConv

具体为，用 depthwise conv（参考【Xception】《Xception: Deep Learning with Depthwise Separable Convolutions》
或者【MobileNet】《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》）

kernel size 等于 FMap-end 的大小（7×7）

padding 为 0

stride 为 1