1. Introduction to steel defect data set
There are six categories of surface defects in NEU-DET steel, namely: 'crazing', 'inclusion', 'patches', 'pitted_surface', 'rolled-in_scale', 'scratches'
The distribution of each category is:
The training results are as follows:
2. Training based on yolov5s
map value 0.742 :
2.1 Inception-MetaNeXtStage
Paper address: https://arxiv.org/pdf/2303.16900.pdf
代码: GitHub - sail-sg/inceptionnext: InceptionNeXt: When Inception Meets ConvNeXt
Unit: NUS, Sea AI Lab (Yan Shuicheng et al.)
Abstract: Inspired by ViT’s long-range modeling capabilities, large-core convolution is used to expand the receptive field to improve model performance. For example, ConvNeXt uses 7x7 depth convolution. Although this deep operator only consumes a small number of FLOPs, with high memory access cost, it largely harms the model efficiency on powerful computing devices. To solve this problem, we propose to decompose the large-kernel depth convolution into four parallel branches along the channel dimension, namely, a small square kernel, two orthogonal band kernels, and an identity map. With this new Inception deep convolution, we built a family of networks, namely IncepitonNeXt, that not only achieve high throughput but also maintain competitive performance.
Figure 1: Trade-off between accuracy and training throughput. All models were trained under DeiT training hyperparameters [61, 37, 38, 69]. Training throughput is measured on an A100 GPU with a batch size of 128. ConvNeXt-T/kn represents a variant with depthwise convolutions with kernel size n × n. InceptionNeXt-T combines the speed of ResNet-50 with the accuracy of ConvNeXt-T.
Figure 2: Block diagram of MetaFormer, MetaNext, ConvNeXt and InceptionNeXt.
Combining the ideas of Inception and the design of ConvNeXt achieves effective decomposition of large-core depth convolutions. This decomposition not only reduces the amount of parameters and calculations, but also retains the advantages of large-kernel depth convolution, which is to expand the receptive field and improve model performance.
2.2 DCNV3
Paper: https://arxiv.org/abs/2211.05778
For the theoretical part, please refer to Zhihu: CVPR2023 Highlight | The scholar model dominates COCO target detection, the research team’s interpretation is made public - Zhihu
Different from recent CNN solutions that focus on large cores, InternImage uses deformation convolution as the core operation (not only has the effective receptive field required for downstream tasks, but also has input and task adaptive spatial domain aggregation capabilities). The proposed scheme reduces the strict inductive bias of traditional CNN and can learn stronger and more robust expression capabilities at the same time. Experiments on tasks such as ImageNet, COCO and ADE20K have verified the effectiveness of the proposed solution. It is worth mentioning that: InternImage-H achieved a new record of 65.4mAP on COCO test-dev .
Corresponding blog:
map is 0.757
2.3 DCNV3+MetaNeXtStage
map is 0.776
3. Summary
By introducing the ideas of CVPR2023 DCNV3 and MetaNeXtStage, the improvement point in steel defects has increased from the original map value of 0.742 to 0.776 . Compared with the original and some published papers, the degree of innovation and novelty is much better. If you need it, you can get it in your own data set Conduct experiments and have a good chance of publishing a paper successfully! ! !
For source code details, see: