The Second-place Solution for CVPR VISION 23 Challenge Track 1 - Data Effificient Defect Detection
论文:https://arxiv.org/pdf/2306.14116.pdf
代码:https://github.com/love6tao/Aoi-overfitting-team
1 Overview
The report details the technical details of the Aoi-overfifitting-Team team in the Data-Efficient Defect Detection Challenge. This challenge requires instance segmentation on 14 industrial inspection datasets with limited training samples . The team's approach focuses on how to improve the segmentation quality of defect masks when training samples are limited. They adopted a transformer backbone (SwinB) based on the Hybrid Task Cascade (HTC) instance segmentation algorithm and enhanced the baseline results with composite connections inspired by CBNetv2. In addition, the team also proposed two model integration methods, one is to combine semantic segmentation with instance segmentation, and the other is to use multi-instance segmentation fusion algorithm. Finally, by adopting multi-scale training and test-time enhancement (TTA), the team achieved remarkable results on the test set of the Data-Efficient Defect Detection Challenge, with an average [email protected]:0.95 of over 48.49%, and an average [email protected]:0.95 of 66.71%.
2 Approach
2-1 Base instance segmentation model
The Swin-B network pre-trained on the Imagenet-22k dataset is used as our basic backbone network. In order to further improve the performance, inspired by the CBNetv2 algorithm, two identical Swin-B networks are combined through composite connections.
2-2 Incorporating semantic segmentation into instance segmentation
Next, a powerful semantic segmentation model is trained using Mask2Former to further improve the segmentation performance, and the results of semantic segmentation are fused with the results of instance segmentation. This fusion method can make comprehensive use of the advantages of the two segmentation methods, so as to obtain more accurate and complete segmentation results.
2-3 Fusion of multiple instance segmentations
Three different segmentation models of HTC, cascaded Mask rcnn-ResNet50 and cascaded Mask rcnn-ConvNext model are given different weights to the models according to the size of the mask-based mAP. Specifically, according to the size of mAP, the higher weight is given to the model, so as to minimize the loss of precision and improve the accuracy of segmentation. Finally, all confidence-adjusted model detections are also further processed using mask IoU-based non-maximum suppression (NMS) technique to filter out redundant detections. This processing method can effectively remove duplicate detection results and improve the robustness and accuracy of the segmentation algorithm.