After training some layers in the fixed model during pre-training, are these layers still changed?

0. Summary

In deep learning, we often use pre-training for fine-tuning to improve the generalization ability of the model, speed up the convergence speed, and save training time.

Generally speaking, there are many ways to pre-train. The common ones are setting a smaller learning rate for the pre-training layer, or fixing the pre-training layer without updating the weights. The latter is discussed here.

1. Fixed layer training

Now suppose you have the model:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.layer1=conv_base  # conv_base是由conv、BN等组成的卷积基
        self.fc1=nn.Linear(512,100)   # 分类器
        self.fc2=nn.Linear(100, 10)   # 分类器
    
    def forward(self,x):
        feat=self.layers(x)
        out=self.fc1(feat)
        out=self.fc2(out)
        return out

Among them, layer1 is the pre-trained feature extraction, and fc1 and fc2 are what we need to train, so we want to fix the parameters of layer1 and only train the parameters of fc1 and fc2. There are two ways to achieve it (essentially the same):

  • Set requires_grad=True for fixed parameters
# 首先将固定层layer1的requires_grad属性置False
for name, p in model.layer1.named_parameters():   # 固定feat权重
	p.requires_grad = False

# 在优化器中过滤到这些requires_grad参数
para=filter(lambda x:x.requires_grad is not False,model.parameters())
optimizer = torch.optim.SGD(para, lr=0.1)
  • Specify only the parameters to be updated directly in the optimizer
para=[{
    
    "params":model.fc1.parameters()},
      {
    
    "params":model.fc2.parameters()}]
optimizer = torch.optim.SGD(para, lr=0.1)

In the above two methods, the optimizer specifies the method of only updating parameters to train specific parameters.

2. Problem

After training in the above way, the parameters of layer1 remain unchanged, only the parameters of fc1 and fc2 change, but when I use the parameters of layer1 to verify on the data set, I find that the indicators have changed. If the parameters of layer1 are unchanged, The metrics should also be the same as before training.

In fact, the problem lies in the train() and eval() modes. We know that BatchNorm needs to operate according to the mean and variance of the data set samples. During training, BatchNorm will calculate the mean and variance of the data set samples. At the same time The mean and variance of this time will be saved. When verifying and testing, the amount of data is relatively small (extremely 1), and the mean and variance at this time are calculated using the mean and variance saved during training.

So although we have fixed the layer1 parameter training, but in the model.train() mode layer1, the parameters in the BatchNorm contained in it are changed. When we use layer1 to test on the data set again, we use the updated mean And std, causing the indicator to change.

To sum up, although the parameters of layer1 non-BatchNorm layer (such as conv, etc.) have not changed, the parameters of BatchNorm have changed.

The solution is also very simple:

# 训练时:先将模型全设为train模型,再将固定曾layer1设为eval模式,这样保证了参数不更新,且BatchNorm层参数也不更新
model.train()
model.layer1.eval()

# 测试时:
model.eval()

Guess you like

Origin blog.csdn.net/qq_40243750/article/details/129503158