The Second-place Solution for CVPR VISION 23 Challenge Track 1 - Data Effificient Defect Detection

The Second-place Solution for CVPR VISION 23 Challenge Track 1 - Data Effificient Defect Detection
论文：https://arxiv.org/pdf/2306.14116.pdf
代码：https://github.com/love6tao/Aoi-overfitting-team

1 Overview

The report details the technical details of the Aoi-overfifitting-Team team in the Data-Efficient Defect Detection Challenge. This challenge requires instance segmentation on 14 industrial inspection datasets with limited training samples . The team's approach focuses on how to improve the segmentation quality of defect masks when training samples are limited. They adopted a transformer backbone (SwinB) based on the Hybrid Task Cascade (HTC) instance segmentation algorithm and enhanced the baseline results with composite connections inspired by CBNetv2. In addition, the team also proposed two model integration methods, one is to combine semantic segmentation with instance segmentation, and the other is to use multi-instance segmentation fusion algorithm. Finally, by adopting multi-scale training and test-time enhancement (TTA), the team achieved remarkable results on the test set of the Data-Efficient Defect Detection Challenge, with an average [email protected]:0.95 of over 48.49%, and an average [email protected]:0.95 of 66.71%.
insert image description here

2 Approach

insert image description here

2-1 Base instance segmentation model

The Swin-B network pre-trained on the Imagenet-22k dataset is used as our basic backbone network. In order to further improve the performance, inspired by the CBNetv2 algorithm, two identical Swin-B networks are combined through composite connections.
insert image description here

2-2 Incorporating semantic segmentation into instance segmentation

Next, a powerful semantic segmentation model is trained using Mask2Former to further improve the segmentation performance, and the results of semantic segmentation are fused with the results of instance segmentation. This fusion method can make comprehensive use of the advantages of the two segmentation methods, so as to obtain more accurate and complete segmentation results.
insert image description here

2-3 Fusion of multiple instance segmentations

Three different segmentation models of HTC, cascaded Mask rcnn-ResNet50 and cascaded Mask rcnn-ConvNext model are given different weights to the models according to the size of the mask-based mAP. Specifically, according to the size of mAP, the higher weight is given to the model, so as to minimize the loss of precision and improve the accuracy of segmentation. Finally, all confidence-adjusted model detections are also further processed using mask IoU-based non-maximum suppression (NMS) technique to filter out redundant detections. This processing method can effectively remove duplicate detection results and improve the robustness and accuracy of the segmentation algorithm.
insert image description here

3 Experiments

insert image description here