CCF Intelligent Unmanned Vehicle Competition (Domestic Oasis Scientific Experiment Cloud Platform) mental journey + AWS Deepracer intelligent unmanned vehicle competition experience (with excellent code re:lnvent 2018 track)

PS: I will form a team from my school to participate in the CCF National Intelligent Unmanned Vehicle Competition (re:lnvent 2018 track) in 2022. At the beginning, I followed the Oasis Science Experiment Cloud Platform used by the school to train the model, but the free training time provided by the school At the beginning, it was only 6 hours, and the team members started from scratch, and the time ran out before they knew it, but the trained car model was not very ideal, and the results were less than 1 minute (because everything is unknown, and every attempt will cost Only limited training time can gain experience from it to go further). In this less ideal situation, I started to seek experience on the Internet, but unexpectedly found that the relevant experience on the domestic website is very limited, and it is very difficult to search. After several nights of information retrieval and guidance to the sponsor After communication, I found that the training of intelligent unmanned vehicles originated from the foreign AWS platform, and each new account on the AWS platform can have 10 hours of free training time, and creating an account only needs to have a credit card (can be bound to multiple different Account!!) and an email address, that is to say, with one credit card and multiple email addresses, you can have several times the training time of 10 hours, and you can also inherit and transfer models between each other, which is undoubtedly a better s Choice! !

 

  1. Unmanned Vehicles: Oasis Cloud Platform VS Amazon AWS Platform

The Oasis cloud platform is a domestic unmanned vehicle training platform. The knowledge content and operation process are almost the same as the AWS platform. It can be understood as the Chinese version of the AWS Deepracer platform.

Difference: 1 (training process). AWS can monitor the training curve in real time and pause immediately, while Oasis Cloud can only see the result of the training curve after the end

         2. (Charges) The main charges of AWS include 1. Training (analysis) 3.5$/h 2. Storage model (the fee is very small, about 10 cents per month for personal use) 0.023$/GB, and the charges are calculated separately. The Oasis cloud platform integrates all costs, only charges for training, the cost is 35/h, and there is no additional charge for analysis and storage

         3. (Model transfer) On the AWS account, it can be realized through the S3 bucket. Model transfer is not supported on the Oasis cloud platform

        2. The basics of unmanned vehicle training (experience)

       The creation of a model consists of three parts 1. Reward function 2. Action space 3. Hyperparameters. The basic concepts of the relevant parts are explained in more detail on the official website, so I won’t go into details here. I will mainly share some of the most commonly used points in the training model.

①Reward function , this is an important source of the speed of the whole car (but remember to match the corresponding and reasonable action space and training time), basically it depends on the logical structure and strategy adjustment of python code, people without programming foundation A suitable reward function can also be adjusted according to the template. Among them, you can add your own strategy adjustment for a single track (applicable to a single track), or you can choose to compile a model with higher fitness (applicable to many tracks), and of course you can also import mathematical functions.

Make the model fit faster (reach the limit of speed and completeness)

 

②Action space , the default is discrete, and each track does have different optimal discrete values, but it is relatively single and limited (those who pursue extreme speed can try to use log visual analysis to analyze the different points of each track Limit speed or angle, as shown in this big guy's article). What I often use is continuous space, and the method I use is: in the case of determining the reward function, the initial model first sets a small maximum speed + minimum speed, and then iterates gradually to gradually increase the maximum and minimum speeds to achieve the model. limit

 

③ Hyperparameters , I have not tried to change the values ​​of each hyperparameter, but the most commonly used and intuitively affected is the batch size, the default is 32, the larger the value of this parameter, the more iterations the model will train in the same time Second, it is recommended to increase the value (such as 512) during the initial model training to reach the limit faster, and then adjust it smaller to reduce the amount of change when fine-tuning later.

3. Analysis model

In the three graphs provided after model training, the most important judgment basis for further training is the red line (analysis or evaluation line), with a full value of 100. If the last red line of the graph is 100, the model at this time will carry out the track. The analysis results are all 100% completion, so the red line is directly linked to the final completion. Generally, for a model above 80, its analysis will also have more than 2-3 100% completion (when analyzing 5 laps), such a model can basically guarantee 0 out of bounds, if there are 1-2 times out of bounds, it can also be done Brush to 0 out of bounds through repeated submissions.

4. Small tips

       ①When the same model is submitted to the same competition track, the results of each submission will fluctuate up and down, and the higher degree of completion (less or no out of bounds) will fluctuate less. During the competition, it can be repeatedly submitted. Refresh the out-of-bounds as much as possible to improve the score (for example, if the analysis result is only 100% once, in the process of submitting the model, you can make it 100% three times by repeating multiple submissions-fast and out of bounds)

 

5 Actual case analysis (re: lnvent 2018 track)

       Model 1: The final score is 29s (the competition system is a three-lap time trial, and a time penalty of 3s is imposed once out of bounds).

The total training time is 2 hours, and the fastest single lap (100%) is about 9.8s

     The first training is as follows, the duration is 1hae7a5af17e8740d0a28e00f212178728.png

865b73b20c414db391ae45b657df346e.png

Of course, the model trained for the first time can't even run a complete track, but this is only the initial model. It is a good thing to have a big jump in the final red line evaluation, and you can continue to train

 

The following is the second training (the first clone, the parameters remain unchanged), the duration is 1h

bccf989a487c4b3da4eeb8b2e617e40b.png

c328099e7e584dc3809dad18fb7fefac.png

At this time, there is a 100% completion rate, and it is 9.865s. According to the repeated submission method I mentioned earlier, you can submit a score that does not go out of bounds three times in the track. It is estimated to be 9.8×3, which can reach approximately 29s

Submit with this model as shown below

144718d98fe644ba9c0746dbf3beeb01.png

The following is the result of the same model submitted to the AWS April Online Open. It is the top 5%, with a total of 1728 people, ranking 66th, because in the top 10%, it advanced to the professional competition in May and got a driver suit award

a716e73abd834f7aac985644369ffec5.png

540b46cfe3ac44b9a2826745b0dafa7b.jpeg

a095da461a8b4582845d3e0b9aaedbea.jpeg

 

It is worth noting that the model I submitted is as above, and the training is the 2018 track

41f6efd90489426e96983d79dd5dece6.png

 

This time, the AWS April Race uses another track, as shown in the figure below

9fd8eff0bd9647099665dd528f64a065.png

 

This difference can show that the model I trained above is more versatile and can achieve better results even on different tracks. The following is the reward function code of the model, which is shared with everyone here.

def reward_function(params): 
    track_width = params['track_width'] 
    distance_from_center = params['distance_from_center'] 
    all_wheels_on_track = params['all_wheels_on_track'] 
    speed = params['speed']
    closest_waypoints = params["closest_waypoints"]
    SPEED_THRESHOLD = 2.0 
    
    # Calculate 3 markers that are at varying distances away from the center line 
    marker_1 = 0.1 * track_width 
    marker_2 = 0.25 * track_width 
    marker_3 = 0.5 * track_width 
    
    # Give higher reward if the car is closer to center line and vice versa 
    if distance_from_center <= marker_1: 
        reward = 2.0 
    elif distance_from_center <= marker_2: 
        reward = 0.5 
    elif distance_from_center <= marker_3: 
        reward = 0.1 
    else: reward = 1e-3 # likely crashed/ close to off track 
    
    if not all_wheels_on_track: 
        # Penalize if the car goes off track 
        reward = 1e-3 
    elif speed < SPEED_THRESHOLD: 
        # Penalize if the car goes too slow 
        reward = reward + 0.1 
    else: 
        # High reward if the car stays on track and goes fast 
        reward = reward + 1.5
        
    if closest_waypoints[0] >= 0 and closest_waypoints[1] <= 16 or closest_waypoints[0] >= 111 and closest_waypoints[1] <= 117:
        speed = 3.0
        if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
            reward = 2.0
    elif closest_waypoints[0] >= 87 and closest_waypoints[1] <= 103:
        speed = 2.6
        if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
            reward = 2.0
    return float(reward)

 

 

 

Model 2. The final score is 27.162s, and the competition system is the same as above. This time the model adopts different training strategies and reward functions (from the guidance and sharing of the boss @Rambo.Fan), because most of the models this time come from this The guidance of a senior is not made independently, so the code of the reward function will not be disclosed here, but this time the stability of the model has reached a new level! !

              The training time is about 6h, after several model iterations

 

The first is the effect of the initial model as shown below

809d8cd7c7ea426ead329026ed43eb3b.png

d7580130f5b54098b6cfdc728b5d5874.png

4cb14d181e574b3cb0f134d65a00a345.png

Among them, in order to quickly reach 100% of the completion, the speed range is set to be small, in order to keep the red line at 100 to achieve fast fitting.

The next few iterations are only to slowly increase the maximum speed and minimum speed, and the red line will be kept at 100 until the last iteration, which is a relative limit of the model in terms of speed and completion.

The average lap time is 9.5s

da6ad037af6d4af1a0956273426b369d.png

c558c186fab249d98b21fca080030687.png

68d09f3b7a2442c7a60da0d17df5de73.png

The stability of the model is very high, because the mathematical function is imported in the reward function, which can help the model to fit quickly

The following are the submitted results in the competition

3d2bb6fcb010414caed7fcb321cae57b.png

Like the previous model, it also participated in the AWS April Open, and advanced to the professional competition with 5% of the previous results, and received corresponding rewards. The following is the current ranking in the professional competition, which is also carried out with the above training model on the 2018 track Participation, which once again shows that the two models have better versatility

02d4a1de42be4c2c9a5440b7326137d5.png

 

 

The final score of model 3 is 26.797, the competition track is as above, the average per lap is 9.2, and the specific training process is also from the predecessor @Rambo.Fan, which will not be shown for the time being

09d62552086745759fc9bc3d8b767527.png

 

 

In the follow-up, I am still actively trying new models. If I can break through 8s in a single lap, it is possible to achieve a score of 24s. I have even seen a score of 22s in other tracks. It is estimated that the single lap has broken through 7s. Ah, there is a long way to go to learn, and I still have a lot to lose, and I hope that everyone will not give up easily or be satisfied, and strive to achieve better results! ! !

Ps. I am still conducting the online tour of the Southern Division of the 2022 CCF National Intelligent Unmanned Vehicle Competition. The promotion results will be announced on 5.31. If you have any relevant questions, you can chat with me privately, and I will actively communicate and learn with you.

 

 

 

Guess you like

Origin blog.csdn.net/weixin_56875062/article/details/124886717