mobilesam:faster segment anything towards lightweight sam for mobile application

Faster Segment Anything (MobileSAM): Segment everything faster, the model is 60 times smaller, and the speed is 50 times faster-Know the original title: Faster Segment Anything: Towards Lightweight SAM for Mobile Applications. [github code] Article link: Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsFacebook recently released... https://zhuanlan.zhihu.com/p/639621335 【Interpretation of the paper】TinyVit: Rapidly distilled vit, which can replace sam's vit (mobilesam) - Know almost 1. Background sam(segment anything model) as a visual basic model, has a very good performance in visual segmentation. For details, refer to the article [Interpretation of the paper] MetaAi SAM (Segment Anything) splits everything. One of the main parts of sam is the image encoder, the image encoder adopts… https://zhuanlan.zhihu.com/p/642469607

The three officially released pth files are relatively large, vit-h: 2.38g, vit-l: 1.28g, vit-b: 357mb. It can be seen from the above that the parameter quantity of vit-based image encoder exceeds 600M, and the prompt-guided mask encoder only has 4M parameters. We can replace the heavyweight image encoder with a lightweight one and retrain the entire sam. This process is knowledge distillation , the difficulty of this direct replacement and retraining lies in the coupling optimization of the image encoder and mask decoder. Based on the idea of ​​​​divide and conquer, you can fix the encoder or decoder and optimize the other. However, the selection of the prompt of the mask decoder segment is random, which makes the mask decoder variable. The core solution of mobilesam is to use the method of decoupling distillation, fix the prompt-guided mask decoder, and distill vit-h to small images in the encoder.

The method is also very simple, that is to keep the prompt encoder+mask decoder unchanged, and use tinyvit to train a lightweight version of sam, the effect is still good, better than fastsam.

Guess you like

Origin blog.csdn.net/u012193416/article/details/132056137