CVPR 2022 | Focused and Global Knowledge Distillation (FGD) for Object Detection

Author丨Mesopotamia Plain@zhihu (authorized)

Source丨https://zhuanlan.zhihu.com/p/477707304

Edit丨Gokushi Platform

This article introduces our CVPR2022 knowledge distillation work on target detection: Focal and Global Knowledge Distillation for Detectors. It only takes 30 lines of code to achieve stable growth on anchor-base, anchor-free single-stage and two-stage detectors. Now the code is open source, you are welcome to try it~

Article link: https://arxiv.org/abs/2111.11837

Code link: https://github.com/yzd-v/FGD

d09edfb7a052c9a27b114aa5cad4424d.png

1. For the problem

1. The problem of front-background imbalance in target detection

The imbalance of front and background is an important problem for object detection, which also affects knowledge distillation.

Knowledge distillation aims to enable students to learn from the teacher's knowledge to obtain similar outputs to improve performance. To explore the feature-level differences between students and teachers, we first visualized their feature maps. It can be seen that there is a big difference between teachers and students in terms of space and channel attention. Among them, in terms of spatial attention, the difference between the two is large in the foreground, and the difference in the background is small, which will bring different learning difficulties to the students in the distillation.

7d9d00d9b33be95835ca6bcf0966ae5d.png

In order to further explore the influence of the foreground background on knowledge distillation, we separated the foreground background and carried out distillation experiments. Distilling the whole image together will lead to the decline of the distillation performance, and students can obtain better performance by separating the foreground and background.

87da2912f5c354ab4631fcc912d92046.png

Aiming at the difference in attention between students and teachers, and the difference between foreground and background, we propose Focal Distillation: separate the front and background, and use the teacher's spatial and channel attention as weights to guide students to perform knowledge distillation and calculate the loss of focus distillation.

2. Loss of global information

As mentioned earlier, Focal Distillation distills the foreground and the background separately, cutting off the connection between the foreground and the background, and lacks the distillation of the global information of the feature. To this end, we propose Global Distillation: using GcBlock to extract the global information of students and teachers respectively, and calculate the global distillation loss.

aeed503ae5532c1fb9b69d52d73208ca.png

Second, the overall framework

FGD only needs to obtain the feature maps of students and teachers to complete the calculation of key distillation loss and global distillation loss, which can be easily applied to various types of detectors.

ba596a24dc286ade526d2d2955a97089.png

3. Experimental results

We conducted experiments on anchor-based and anchor-free single-stage and two-stage detectors, and both student detectors achieved substantial AP and AR improvements on COCO2017.

ac9fe22a7ebcb3138c9e2a3b0fead4cc.png

We use a stronger detector to distill the students and find that FGD brings a larger performance boost to the model when a stronger model is used as the teacher for the distillation. For example, RetinaNet-R50 can achieve mAP of 39.7 and 40.7 under the teacher distillation of ResNet-101 and ResNeXt-101, respectively.

bf74e3bbc47d3c065109656d725db062.png

For the student model after distillation is done using FGD, we again visualize the attention. It can be seen that the distribution of spatial attention and channel attention of the students after FGD training is very similar to that of the teacher, which indicates that the student has learned the teacher's knowledge and obtained better features through distillation, thereby achieving a high performance. promote.

6e89f48379f3b835ccc8cbc63fd3940c.png

4. More Distillation Settings

We have open sourced the code: https://github.com/yzd-v/FGD
The code is implemented based on MMDetection, which is easy to reproduce, and more teacher and student distillation settings have been added. The relevant results are also given in the code. Welcome Everyone uses.

201f3b413034fd64606811e32ed1b681.png

Reply to "Dataset" in the background of the official account to get 50+ deep learning datasets to download~

Dry goods download and study

Backstage reply: Barcelona Autonomous University courseware, you can download the 3D Vison high-quality courseware accumulated by foreign universities for several years

Background reply: computer vision books, you can download the pdf of classic books in the field of 3D vision

Backstage reply: 3D vision courses, you can learn excellent courses in the field of 3D vision

This article is for academic sharing only, if there is any infringement, please contact to delete the article.

Heavy! Computer Vision Workshop - Learning Exchange Group has been established

Scan the code to add a WeChat assistant, and you can apply to join the 3D Vision Workshop - Academic Paper Writing and Submission WeChat exchange group, which aims to exchange writing and submission matters such as top conferences, top journals, SCI, and EI.

At the same time , you can also apply to join our subdivision direction exchange group. At present, there are mainly ORB-SLAM series source code learning, 3D vision , CV & deep learning , SLAM , 3D reconstruction , point cloud post-processing , automatic driving, CV introduction, 3D measurement, VR /AR, 3D face recognition, medical imaging, defect detection, pedestrian re-identification, target tracking, visual product landing, visual competition, license plate recognition, hardware selection, depth estimation, academic exchanges, job search exchanges and other WeChat groups, please scan the following WeChat account plus group, remarks: "research direction + school/company + nickname", for example: "3D vision + Shanghai Jiaotong University + Jingjing". Please remark according to the format, otherwise it will not be approved. After the addition is successful, the relevant WeChat group will be invited according to the research direction. Please contact for original submissions .

81f0d46ba35d7a38f0d87170d0ef5933.png

▲Long press to add WeChat group or contribute

b698599a0f5ebd5dc6157e994963e925.png

▲Long press to follow the official account

3D vision from entry to proficient knowledge planet : video courses for the field of 3D vision ( 3D reconstruction series , 3D point cloud series , structured light series , hand-eye calibration , camera calibration , laser/vision SLAM, automatic driving, etc. ), summary of knowledge points , entry and advanced learning route, the latest paper sharing, and question answering for in-depth cultivation, and technical guidance from algorithm engineers from various large factories. At the same time, Planet will cooperate with well-known companies to release 3D vision-related algorithm development positions and project docking information, creating a gathering area for die-hard fans that integrates technology and employment. Nearly 4,000 Planet members make common progress and knowledge to create a better AI world. Planet Entrance:

Learn the core technology of 3D vision, scan and view the introduction, unconditional refund within 3 days

b4f1c67f7b60821b34e4ad2c14a05281.png

 There are high-quality tutorial materials in the circle, which can answer questions and help you solve problems efficiently

I find it useful, please give a like and watch~

Guess you like

Origin blog.csdn.net/qq_29462849/article/details/123516531