table of Contents
论文链接:Interspecies Knowledge Transfer for Facial Keypoint Detection
Code: https://github.com/ menoRashid / animal_human_kp
Authors and organizations:
Summary:
We propose a method to locate the animal facial features key points by converting Facial information. Not so much direct training network will face key points to finetune animal face critical point (this method is sub-optimal, because the animal's face and the face looks very different), we recommend the different animals and humans by modifying the shape of the face, so that the images of animals to adapt to face detection pre-training model. First, we use unsupervised by an animal shape matching method for each input image find the most similar facial images. We use these matches to train a network to warp warp each one input of an animal face it like a human face. The network then wrap and after a pre-trained people face critical point detection network joint finetune with animal data . We presented the latest results of horse and sheep face critical point detection with a simple fine-tuning has significantly improved compared especially when the training data is scarce. In addition, we propose a new data set with 3717 has the Malian image and facial key points mark.
1. Introduction
Face detection is the key prerequisite for face alignment and registration is important, and facial expression analysis, face tracking, have a certain influence and manipulation and conversion Face graphic method. Although the face detection key point is a relatively mature field of study, but the animal's face critical point detection is a relatively unexplored field. For example, studies have shown that equine veterinarian, mice, sheep, and cats have different facial expressions in the face of pain (usually face critical point detection can help detect animal pain). In this paper, we mainly horses and sheep face critical point detection. Convolutional neural network (cnn) has a good performance in key areas of human face detection, and therefore a good choice cnn animal key point is detection. Unfortunately, training a neural network from scratch requires a lot of tagging data, time-consuming and costly. In addition, when there is insufficient training data, cnn can finetune the way adoption. Generalization ability of the network of pre-trained by the amount of data that can be used to fine-tune and limit correlation between the two tasks . For example, previous work has shown that training the network has a limited ability to adapt to the natural objects on man-made objects, and only related to the objectives and tasks, additional pre-training data is useful. We have a large number of people face critical point mark data, but not a lot of key points animal training data to train the neural network. At the same time, due to the different faces and facial structure of the animal, the direct use of fine-tuning might not get good results. In this article, we have by way of converting human face and animal facial data to solve this problem (critical point detection). How can we achieve this effect, however it through cnns? We are mainly pre-trained to adapt the data collection network to better fine-tuning, rather than pre-training network to adapt to the new training data set. The new data set and the pre-match training mission data mapping, we can use a person's face critical point detection network, and then finetune network to enable them to detect animal face. Specifically, the idea is to warp each animal images, to make it look more like a human, wrap an image and then use the resulting fine-tune the network through pre-trained to detect human face critical point.
Intuitively, by making the face looks more like human animal, we can achieve the correct difference in their shapes, so the trimming process, the network only needs to adapt its appearance difference. For example, the distance between the mouth of the horse is generally much smaller than the distance between their eyes, but for humans, generally similar distances (different shapes). In addition, there are horse fur, people do not. We'll pull through wrap the horse's mouth network to adjust the shape difference, and in the fine-tuning process, the key point detection network will learn to adjust for the difference in appearance.
2. Related work
3. Approach
detector, taking into account the kinds of differences between their domain. For training, we assume that access to key points of the animal's face comment, annotate key points of the face and its corresponding pre-trained human critical point detector. For testing purposes, we can assume that the use of animals face detector (ie, we focus only on the key points of the face detection rather than face detection). Our approach consists of three main steps: Find each animal face similar posture recent visits face; use the nearest neighbor to train from animals to humans deformation network; and using deformation (human-like) to fine-tune images of animals the key point for pre-trained to detect human face animal key point detector.
4. Experiments
我们使用与[51]相同的度量标准进行评估:如果预测的关键点与标注的关键点之间的欧式距离大于面部(边界框)大小的10%,则视为失败。 然后我们将平均失败率计算为失败的测试关键点的百分比。
我们发现,在联合训练之前对wrap网络进行预训练可以带来更好的性能。 为了训练wrap和关键点网络,我们对每张动物图像使用K = 5个人类图像。 这些匹配项还用于4.4节中介绍的“ GT Warp”网络中。
对于TPS wraping 网络,我们使用5×5的控制点网格。 我们使用adam[22]优化器。 wrap网络训练的基本学习率为0.001,而预训练层的学习率则低10倍。 它训练了50个epoch,学习率在25个epoch后降低了10倍。 在完整的系统训练过程中,wrap网络的学习速率相同,而关键点检测网络的学习速率为0.01。 我们将网络训练了150个epoch,分别在50和100个epoch后降低了学习率。 最后,我们使用水平翻转和从-10°到10°的旋转(以5°为增量)进行数据增强。
5. Conclusion
我们提出了一种新的动物面部关键点定位方法。 传统的deeplearning通常需要大量的带标注的数据,此类数据集的制作费时又费利。因此我们没有制作大型带标注的动物数据集,而是使动物的脸部形状wrap成人的形状。 通过这种方式,我们可以利用目前已有的人脸关键点数据集来进行与动物脸部关键点检测任务。 并我们将我们的方法与其他baselines进行了实验对比,并展示了有关马和绵羊面部关键点检测的最新结果。 最后,我们制作了Horse Facial Keypoint数据集,我们希望该数据将对动物面部关键点检测这一领域有所帮助。