Why is Tesla betting all on Transformer?

When it comes to purely visual autonomous driving solutions, the first thing that comes to everyone’s mind is Tesla. Indeed, as early as 2021, Tesla has implemented a purely visual BEV detection solution, and the effect is very good.

d4dccd0fa71339d21cfc723bc4c50ba0.png

Careful students may have discovered that the core component in this BEV solution that converts images from the camera space to the BEV space is the Transformer.

Transformer comes from the field of natural language processing and was first applied to machine translation. Later, everyone discovered that it was also very effective in the field of computer vision, and it crushed the CNN network in major rankings.

68c05c5aaec0d998659967b5b15ab75e.png

In the field of target detection, the visual Transformer can not only achieve 2D detection, 3D detection, but also multi-modal detection. The performance of detection from the BEV perspective is also very good.

ccd2e9cc5841f2bfff09c3630867ad73.png

Therefore, mastering Transformer-related knowledge and engineering foundations has become a skill requirement for companies recruiting algorithm engineers, and it is also a big plus point on a resume.

However, there are three difficulties in mastering the Transformer-based target detection algorithm :

  • Understand the theoretical basis behind Transformer, such as self-attention mechanism (self-attention), positional embedding (positional embedding), object query, etc. The information on the Internet is relatively messy and not systematic enough, making it difficult to achieve in-depth understanding through self-study. And integrate it.e3682ad81c1f097119524c0489c86bd3.png

  • Master the ideas and innovations of the Transformer-based target detection algorithm. Some Transformer papers involve many new concepts, and the language is not so easy to understand. After reading the paper, you still do not understand the details of the algorithm.

246868e916e8e70e1f182da1e402491c.png
2
  • The Transformer code is not easy to understand because the mechanism of action is quite different from that of CNN, so it takes a lot of effort to fully understand the code and apply it in practice.

c2c0e45955c3f466c4d4b87697b109fc.png
3

So how to learn the target detection algorithm based on Tansformer?

The co-lecturer of the 3D Vision Workshop "Yu Yan" carefully prepared the course " Visual Transformer in Target Detection " for everyone, mainly to help students solve the above difficulties.

It not only explains the basic knowledge of visual Transformer and various classic Transformer-based target detection algorithms in detail, but also provides code interpretation and practical courses, so that everyone can truly learn and apply, understand and master these knowledge theories.

Practical part

dd65b2817ec6f1e9136bbe1b84bff29a.png 0550314c2e3f14d54aadb3b684c3e018.png 50280d48c392fcbd47a7e39b1013543c.jpeg 14574cd4e03284505f8e0e56002d2606.jpeg 109c1da397c4190ffb1f3b19e8f6b95a.jpeg 80aa11113d22bb0332fbd88bef1fef77.jpeg 7963577138bd4f97c2c429917f2b8bf7.png 924b0722a4da5eb655e035343f8cd367.png 337dfdaf97537d35cf40bf24e4e1c150.png

6b3a68c9537fca995e0a43308b2df2dd.png

f85eb2b6dc88da05d0247af1cc2b2370.jpeg e25b167e9dc2959ccd7219d5ad6b05dc.jpeg 5044b62babbaa52671dc9673beadcdb1.jpeg

Class start time

At 8pm on July 28, 2023 (Friday), one chapter will be updated every week.

Course Q&A

Questions and answers for this course are mainly answered in the Goose Circle corresponding to this course. If students have any questions during the learning process, they can ask them in the Goose Circle at any time.

27c44dca3982b692c79710a5caf90368.png
▲Press and hold to purchase the course. The first 50 people will enjoy the early bird price and receive an immediate discount of 30 yuan.
f5321f706709dd8043eb00af608a7bae.jpeg
▲Long press to add assistant WeChat: cv3d007, for more consultation

Guess you like

Origin blog.csdn.net/Yong_Qi2015/article/details/132929008