Six strategies for pytorch torch.optim.lr_scheduler to adjust the learning rate

Six strategies for pytorch torch.optim.lr_scheduler to adjust the learning rate

1. Why do you need to adjust the learning rate

In the process of deep learning training, the most important parameter is the learning rate. Generally speaking, the learning rate will not remain constant throughout the trained layer. In order to allow the model to quickly converge in the early stage of training, the learning rate is usually relatively large. , at the end of training, in order to let the model converge to a smaller local optimal point, the learning rate is usually relatively small.

2. Setting the initial value of the learning rate

In fact, the initial value of the learning rate of different tasks needs to be obtained by several experiments. The optimizer used is different, the batch_size of the mini-batch is different, and the initial value of the learning rate is not the same.
According to my experimental experience, for the same task, if you use Adaman optimizer, the initial value of the learning rate 0.001 is better, and the Adam optimizer is actually not very sensitive to the initial value, and can basically achieve fast convergence; if you use an SGDoptimizer, the learning rate It needs to be multiplied by 10 times or 100 times on the basis of Adam, that is, it is better to use 0.1or 0.01better.
One thing to note is that usually when the batch_size is enlarged nby times, the learning rate should also be enlarged accordingly . Even so, when I experimented by myself, I didn't find much effect, which may be because of the use . n \sqrt n n Adam

3. Torch learning rate adjustment strategy

pytorch provides some basic learning rate adjustment strategies. In torch.optim.lr_schedulerthe module, the code below implements five learning rate adjustment methods that may be used (in fact, I use the most basic ladder descent). The picture visualizes the learning rate The process of shrinking is very simple. Take a closer look at the pictures and codes to understand the meaning of the specific parameters of each scheduler:

import torch
import matplotlib.pyplot as plt

lr = 0.001

# 20代表从lr从最大到最小的epoch数,0代表学习率的最小值
scheduler_cos = torch.optim.lr_scheduler.CosineAnnealingLR(torch.optim.SGD([torch.ones(1)], lr), 20, 0)
# 20 和 0.5 代表每走20个epoch,学习率衰减0.5倍,阶梯形式
scheduler_step = torch.optim.lr_scheduler.StepLR(torch.optim.SGD([torch.ones(1)], lr), 20, 0.5)
# 每走一个epoch,学习率衰减0.95倍
scheduler_exp = torch.optim.lr_scheduler.ExponentialLR(torch.optim.SGD([torch.ones(1)], lr), 0.95)
# 三角的形式,0.0001代表最小的学习率, 0.001代表最大的学习率, 20代表一个升降周期
scheduler_cyc = torch.optim.lr_scheduler.CyclicLR(torch.optim.SGD([torch.ones(1)], lr), 0.0001, 0.001, 20)
# 阶梯衰减,每次衰减的epoch数根据列表 [20, 30, 60, 80] 给出,0.8代表学习率衰减倍数
scheduler_mul = torch.optim.lr_scheduler.MultiStepLR(torch.optim.SGD([torch.ones(1)], lr), [20, 30, 60, 80], 0.8)

lr_cos  = []
lr_step = []
lr_exp  = []
lr_cyc  = []
lr_mul  = []
for i in range(100):
    lr_cos  += scheduler_cos.get_last_lr()
    lr_step += scheduler_step.get_last_lr()
    lr_exp  += scheduler_exp.get_last_lr()
    lr_cyc  += scheduler_cyc.get_last_lr()
    lr_mul  += scheduler_mul.get_last_lr()
    scheduler_cos.step()
    scheduler_step.step()
    scheduler_exp.step()
    scheduler_cyc.step()
    scheduler_mul.step()

plt.figure(figsize=(12,7))
plt.plot(list(range(len(lr_cos))), lr_cos,
         list(range(len(lr_step))), lr_step,
         list(range(len(lr_exp))), lr_exp, 
         list(range(len(lr_cyc))), lr_cyc, 
         list(range(len(lr_mul))), lr_mul,)
plt.legend(['cos','step','exp','cyc', 'mul'], fontsize=20)
plt.xlabel('epoch', size=15)
plt.ylabel('lr', size=15)
plt.show()

The effect after running the above code:
insert image description here

Guess you like

Origin blog.csdn.net/baoxin1100/article/details/107446538