最近看论文，看到有的方案是先训练一个模型，然后把这个训练好的模型的一部分结构拿到新的模型中，再用新的数据训练新的模型，但是拿过来那一部分的参数是要保留的，起初觉得这很像迁移学习，因为没具体学习过，所以不太确定，就先学习了一下怎么实现论文的方案，这里记录一下，方便以后查阅。至于迁移学习，以后在学习吧，应该会用到的！

PyTorch保存部分模型参数，并在新的模型中加载

state_dict
- state_dict简介
- 保存和加载state_dict
将模型保存到当前路径，名称为test_state_dict.pth
- 保存和加载完整模型
OrderedDict
保存部分模型参数，并在新的模型中加载
state_dict()、named_parameters()、model.parameter()、named_modules() 的区别
冻结某些层/只让某些层学习
- require_grad=False
- 设置优化器更新参数
参考资料

state_dict

state_dict简介

state_dict是Python的字典对象，可用于保存模型参数、超参数以及优化器（torch.optim）的状态信息。需要注意的是，只有具有可学习参数的层（如卷积层、线性层等）才有state_dict。

举个栗子说明state_dict的使用：

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
 
# 定义模型
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
 
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
 
# 初始化模型
model = TheModelClass()
 
# 初始化优化器
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# 打印模型的状态字典
print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

输出：

Model's state_dict:
conv1.weight 	 torch.Size([6, 3, 5, 5])
conv1.bias 	 torch.Size([6])
conv2.weight 	 torch.Size([16, 6, 5, 5])
conv2.bias 	 torch.Size([16])
fc1.weight 	 torch.Size([120, 400])
fc1.bias 	 torch.Size([120])
fc2.weight 	 torch.Size([84, 120])
fc2.bias 	 torch.Size([84])
fc3.weight 	 torch.Size([10, 84])
fc3.bias 	 torch.Size([10])

# 打印优化器的状态字典
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

输出：

Optimizer's state_dict:
state 	 {
    
    }
param_groups 	 [{
    
    'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'params': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}]

保存和加载state_dict

可以通过torch.save()来保存模型的state_dict，即只保存学习到的模型参数，并通过load_state_dict()来加载并恢复模型参数。PyTorch中最常见的模型保存扩展名为’.pt’或’.pth’。

将模型保存到当前路径，名称为test_state_dict.pth

PATH = './test_state_dict.pth'
torch.save(model.state_dict(), PATH)
 
model = TheModelClass()    # 首先通过代码获取模型结构
model.load_state_dict(torch.load(PATH))   # 然后加载模型的state_dict
model.eval()

注意：load_state_dict()函数只接受字典对象，不可直接传入模型路径，所以需要先使用torch.load()反序列化已保存的state_dict。

保存和加载完整模型

# 保存完整模型
torch.save(model, PATH)
 
# 加载完整模型
model = torch.load(PATH)
model.eval()

这种方式虽然代码看起来较state_dict方式要简洁，但是灵活性会差一些。因为torch.save()函数使用Python的pickle模块进行序列化，但pickle无法保存模型本身，而是保存包含类的文件路径，该文件会在模型加载时使用。所以当在其他项目对模型进行重构之后，就可能会出现意想不到的错误。

OrderedDict

如果我们打印一下state_dict的数据类型，我们会得到如下的输出：

print(type(model.state_dict()))

输出：

<class 'collections.OrderedDict'>

collections模块实现了特定目标的容器，以提供Python标准内建容器 dict , list , set , 和 tuple 的替代选择。

class collections.OrderedDict([items])

OrderedDict是 dict 子类的实例，有序词典就像常规词典一样，但有一些与排序操作相关的额外功能。

值得一提的是，python3.7后，内置的 dict 类获得了记住插入顺序的能力，所以这个容器不那么重要了。

一些与dict 的不同：

常规的 dict 被设计为非常擅长映射操作。跟踪插入顺序是次要的；
OrderedDict 旨在擅长重新排序操作。空间效率、迭代速度和更新操作的性能是次要的；
算法上， OrderedDict 可以比 dict 更好地处理频繁的重新排序操作。这使其适用于跟踪最近的访问（例如在 LRU cache 中）；

保存部分模型参数，并在新的模型中加载

对于上述的模型，其模型的状态字典为：

Model's state_dict:
conv1.weight 	 torch.Size([6, 3, 5, 5])
conv1.bias 	 torch.Size([6])
conv2.weight 	 torch.Size([16, 6, 5, 5])
conv2.bias 	 torch.Size([16])
fc1.weight 	 torch.Size([120, 400])
fc1.bias 	 torch.Size([120])
fc2.weight 	 torch.Size([84, 120])
fc2.bias 	 torch.Size([84])
fc3.weight 	 torch.Size([10, 84])
fc3.bias 	 torch.Size([10])

如果我们只想保存conv1的训练完成的参数，我们可以这样操作：

save_state = {
    
    }
print("Model's state_dict:")
for param_tensor in model.state_dict():
    if 'conv1' in param_tensor:
        save_state.update({
    
    param_tensor:torch.ones((model.state_dict()[param_tensor].size()))})
        print(param_tensor, "\t", model.state_dict()[param_tensor].size())

这里为了方便后续的演示，我们关键的一句话是这样的写的：

save_state.update({
    
    param_tensor:torch.ones((model.state_dict()[param_tensor].size()))})

但是在实际保存的时候，我们应该这样写：

save_state.update({
    
    param_tensor:model.state_dict()[param_tensor]})

然后保存save_state这个字典：

PATH = './test_state_dict.pth'
torch.save(save_state, PATH)

然后加载新的模型，并将保存的参数赋给新的模型：

model = TheModelClass()    # 首先通过代码获取模型结构
model.load_state_dict(torch.load(PATH), strict=False)   # 然后加载模型的state_dict

输出：

_IncompatibleKeys(missing_keys=['conv2.weight', 'conv2.bias', 'fc1.weight', 'fc1.bias', 'fc2.weight', 'fc2.bias', 'fc3.weight', 'fc3.bias'], unexpected_keys=[])

这里为热启动模式，通过在load_state_dict()函数中将strict参数设置为False来忽略非匹配键的参数。

我们再查看一下新的模型的参数：

model.state_dict()['conv1.bias']

输出：

tensor([1., 1., 1., 1., 1., 1.])

发现之间的保存的参数已经加载到新的模型了。

再看一下模型中的其他参数：

model.state_dict()['conv2.bias']

输出：

tensor([ 0.0468,  0.0024, -0.0510,  0.0791,  0.0244, -0.0379, -0.0708,  0.0317,
        -0.0410, -0.0238,  0.0071,  0.0193, -0.0562, -0.0336,  0.0109, -0.0323])

可以看到其他的参数是正常的！

state_dict()、named_parameters()、model.parameter()、named_modules() 的区别

model.state_dict()

state_dict()是将 layer_name 与 layer_param 以键的形式存储为 dict 。包含所有层的名字和参数，所存储的模型参数 tensor 的 require_grad 属性都是 False 。输出的值不包括 require_grad 。在固定某层时不能采用 model.state_dict() 来获取参数设置 require_grad 属性。

import torch
import torch.nn as nn
import torch.optim as optim
 
# 定义模型
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(1, 2, 3)
        self.bn = nn.BatchNorm2d(num_features=2)
        self.act = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(8, 4)
        self.softmax = nn.Softmax(dim=-1)
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn(x)
        x = self.act(x)
        x = self.pool(x)
        x = x.view(-1, 8)
        x = self.fc1(x)
        x = self.softmax(x)
        return x
 
# 初始化模型
model = TheModelClass()
 
# 初始化优化器
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for param_tensor in model.state_dict():
    print(param_tensor, "\n", model.state_dict()[param_tensor])

输出：

conv1.weight 
 tensor([[[[ 0.2438, -0.0467,  0.0486],
          [-0.1932, -0.2083,  0.3239],
          [ 0.1712,  0.0379, -0.2381]]],


        [[[ 0.2853,  0.0961,  0.0809],
          [ 0.2526,  0.3138, -0.2243],
          [-0.1627, -0.2958, -0.1995]]]])    # 没有 require_grad
conv1.bias 
 tensor([-0.3287, -0.0686])
bn.weight 
 tensor([1., 1.])
bn.bias 
 tensor([0., 0.])
bn.running_mean 
 tensor([0., 0.])
bn.running_var 
 tensor([1., 1.])
bn.num_batches_tracked 
 tensor(0)
fc1.weight 
 tensor([[ 0.2246, -0.1272,  0.0163, -0.3089,  0.3511, -0.0189,  0.3025,  0.0770],
        [ 0.2964,  0.2050,  0.2879,  0.0237, -0.3424,  0.0346, -0.0659, -0.0115],
        [ 0.1960, -0.2104, -0.2839,  0.0977, -0.2857, -0.0610, -0.3029,  0.1230],
        [-0.2176,  0.2868, -0.2258,  0.2992, -0.2619,  0.3286,  0.0410,  0.0152]])
fc1.bias 
 tensor([-0.0623,  0.1708, -0.1836, -0.1411])

model.named_parameters()

named_parameters()是将 layer_name 与 layer_param 以打包成一个元组然后再存到 list 当中。
只保存可学习、可被更新的参数。model.named_parameters() 所存储的模型参数 tensor 的 require_grad 属性都是True。常用于固定某层的参数是否被训练，通常是通过 model.named_parameters() 来获取参数设置 require_grad 属性。

import torch
import torch.nn as nn
import torch.optim as optim
 
# 定义模型
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(1, 2, 3)
        self.bn = nn.BatchNorm2d(num_features=2)
        self.act = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(8, 4)
        self.softmax = nn.Softmax(dim=-1)
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn(x)
        x = self.act(x)
        x = self.pool(x)
        x = x.view(-1, 8)
        x = self.fc1(x)
        x = self.softmax(x)
        return x
 
# 初始化模型
model = TheModelClass()
 
# 初始化优化器
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for layer_name, layer_param in model.named_parameters():
    print(layer_name, "\n", layer_param)

输出：

conv1.weight 
 Parameter containing:
tensor([[[[ 0.2438, -0.0467,  0.0486],
          [-0.1932, -0.2083,  0.3239],
          [ 0.1712,  0.0379, -0.2381]]],


        [[[ 0.2853,  0.0961,  0.0809],
          [ 0.2526,  0.3138, -0.2243],
          [-0.1627, -0.2958, -0.1995]]]], requires_grad=True)    # require_grad为True
conv1.bias 
 Parameter containing:
tensor([-0.3287, -0.0686], requires_grad=True)
bn.weight 
 Parameter containing:
tensor([1., 1.], requires_grad=True)
bn.bias 
 Parameter containing:
tensor([0., 0.], requires_grad=True)
fc1.weight 
 Parameter containing:
tensor([[ 0.2246, -0.1272,  0.0163, -0.3089,  0.3511, -0.0189,  0.3025,  0.0770],
        [ 0.2964,  0.2050,  0.2879,  0.0237, -0.3424,  0.0346, -0.0659, -0.0115],
        [ 0.1960, -0.2104, -0.2839,  0.0977, -0.2857, -0.0610, -0.3029,  0.1230],
        [-0.2176,  0.2868, -0.2258,  0.2992, -0.2619,  0.3286,  0.0410,  0.0152]],
       requires_grad=True)
fc1.bias 
 Parameter containing:
tensor([-0.0623,  0.1708, -0.1836, -0.1411], requires_grad=True)

model.parameter()

parameter()返回的只是参数，不包括 layer_name 。返回结果包含 require_grad，且均为 Ture，这主要是网络在创建时，默认参数都是需要学习的，即 require_grad 都是 True。

import torch
import torch.nn as nn
import torch.optim as optim
 
# 定义模型
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(1, 2, 3)
        self.bn = nn.BatchNorm2d(num_features=2)
        self.act = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(8, 4)
        self.softmax = nn.Softmax(dim=-1)
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn(x)
        x = self.act(x)
        x = self.pool(x)
        x = x.view(-1, 8)
        x = self.fc1(x)
        x = self.softmax(x)
        return x
 
# 初始化模型
model = TheModelClass()
 
# 初始化优化器
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for layer_param in model.parameters():
    print(layer_param)

输出：

Parameter containing:
tensor([[[[ 0.2438, -0.0467,  0.0486],
          [-0.1932, -0.2083,  0.3239],
          [ 0.1712,  0.0379, -0.2381]]],


        [[[ 0.2853,  0.0961,  0.0809],
          [ 0.2526,  0.3138, -0.2243],
          [-0.1627, -0.2958, -0.1995]]]], requires_grad=True)
Parameter containing:
tensor([-0.3287, -0.0686], requires_grad=True)
Parameter containing:
tensor([1., 1.], requires_grad=True)
Parameter containing:
tensor([0., 0.], requires_grad=True)
Parameter containing:
tensor([[ 0.2246, -0.1272,  0.0163, -0.3089,  0.3511, -0.0189,  0.3025,  0.0770],
        [ 0.2964,  0.2050,  0.2879,  0.0237, -0.3424,  0.0346, -0.0659, -0.0115],
        [ 0.1960, -0.2104, -0.2839,  0.0977, -0.2857, -0.0610, -0.3029,  0.1230],
        [-0.2176,  0.2868, -0.2258,  0.2992, -0.2619,  0.3286,  0.0410,  0.0152]],
       requires_grad=True)
Parameter containing:
tensor([-0.0623,  0.1708, -0.1836, -0.1411], requires_grad=True)

model.named_modules()

返回每一层模型的名字和结构

import torch
import torch.nn as nn
import torch.optim as optim
 
# 定义模型
class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(1, 2, 3)
        self.bn = nn.BatchNorm2d(num_features=2)
        self.act = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(8, 4)
        self.softmax = nn.Softmax(dim=-1)
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn(x)
        x = self.act(x)
        x = self.pool(x)
        x = x.view(-1, 8)
        x = self.fc1(x)
        x = self.softmax(x)
        return x
 
# 初始化模型
model = TheModelClass()
 
# 初始化优化器
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for name, module in model.named_modules():
    print(name,'\n', module)

输出：

 TheModelClass(
  (conv1): Conv2d(1, 2, kernel_size=(3, 3), stride=(1, 1))
  (bn): BatchNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act): ReLU()
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=8, out_features=4, bias=True)
  (softmax): Softmax(dim=-1)
)
conv1 
 Conv2d(1, 2, kernel_size=(3, 3), stride=(1, 1))
bn 
 BatchNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
act 
 ReLU()
pool 
 MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
fc1 
 Linear(in_features=8, out_features=4, bias=True)
softmax 
 Softmax(dim=-1)

冻结某些层/只让某些层学习

目前有这样的的一个需求：我用大量的数据训练了一个网络，然后我需要用这个训练好的网络来测试新的被试的准确率，在网络模型不变的情况下，我想用迁移学习的思路，把训练好的参数拿过来，只需要用新的被试的极少量的数据训练模型的分类头，模型的其他层不需要训练。这样我就需要冻结模型的某些层，也就是只训练模型的某些层。

通过上面的分析，我用state_dict()读取模型的参数，并将其保存起来：

model_dict = model.state_dict()

因为用state_dict()函数得到的模型参数是没有require_grad属性的，而且文章上面也说state_dict()所存储的模型参数 tensor 的 require_grad 属性都是 False。

然后我们把需要训练的层的参数，在保存的模型的参数中删除，因为只有删除了，在创建新的模型对象并加载之前的模型的参数之后，需要训练的层的参数才不会被之前的模型的参数覆盖。

model_dict.pop('fc1.weight', None)

输出：

tensor([[ 0.2246, -0.1272,  0.0163, -0.3089,  0.3511, -0.0189,  0.3025,  0.0770],
        [ 0.2964,  0.2050,  0.2879,  0.0237, -0.3424,  0.0346, -0.0659, -0.0115],
        [ 0.1960, -0.2104, -0.2839,  0.0977, -0.2857, -0.0610, -0.3029,  0.1230],
        [-0.2176,  0.2868, -0.2258,  0.2992, -0.2619,  0.3286,  0.0410,  0.0152]])

model_dict.pop('fc1.bias', None)

tensor([-0.0623,  0.1708, -0.1836, -0.1411])

然后我们再打印一下保存的模型的参数：

for param_tensor in model_dict:
    print(param_tensor, "\n", model_dict[param_tensor])

输出：

conv1.weight 
 tensor([[[[ 0.2438, -0.0467,  0.0486],
          [-0.1932, -0.2083,  0.3239],
          [ 0.1712,  0.0379, -0.2381]]],


        [[[ 0.2853,  0.0961,  0.0809],
          [ 0.2526,  0.3138, -0.2243],
          [-0.1627, -0.2958, -0.1995]]]])
conv1.bias 
 tensor([-0.3287, -0.0686])
bn.weight 
 tensor([1., 1.])
bn.bias 
 tensor([0., 0.])
bn.running_mean 
 tensor([0., 0.])
bn.running_var 
 tensor([1., 1.])
bn.num_batches_tracked 
 tensor(0)

发现我们删掉的层的参数已经没有了。

然后我们新建一个模型的对象，并把之前保存的模型的参数加载到新的模型对象中：

model_ = TheModelClass()
model_.load_state_dict(model_dict, strict=False)

输出：

_IncompatibleKeys(missing_keys=['fc1.weight', 'fc1.bias'], unexpected_keys=[])

然后我们看一下新的模型对象的参数的require_grad属性是什么样的：

model_dict_ = model_.named_parameters()
for layer_name, layer_param in model_dict_ :
print(layer_name, “\n”, layer_param)

输出：

conv1.weight 
 Parameter containing:
tensor([[[[ 0.2438, -0.0467,  0.0486],
          [-0.1932, -0.2083,  0.3239],
          [ 0.1712,  0.0379, -0.2381]]],


        [[[ 0.2853,  0.0961,  0.0809],
          [ 0.2526,  0.3138, -0.2243],
          [-0.1627, -0.2958, -0.1995]]]], requires_grad=True)
conv1.bias 
 Parameter containing:
tensor([-0.3287, -0.0686], requires_grad=True)
bn.weight 
 Parameter containing:
tensor([1., 1.], requires_grad=True)
bn.bias 
 Parameter containing:
tensor([0., 0.], requires_grad=True)
fc1.weight 
 Parameter containing:
tensor([[-0.2306, -0.3159, -0.3105, -0.3051,  0.2721, -0.0691,  0.2208, -0.1724],
        [-0.0238, -0.1555,  0.2341, -0.2668,  0.3143,  0.1433,  0.3140, -0.2014],
        [ 0.0696, -0.0250,  0.0316, -0.1065,  0.2260, -0.1009, -0.1990, -0.1758],
        [-0.1782, -0.2045, -0.3030,  0.2643,  0.1951, -0.2213, -0.0040,  0.1542]],
       requires_grad=True)
fc1.bias 
 Parameter containing:
tensor([-0.0472, -0.0569, -0.1912, -0.2139], requires_grad=True)

我们可以看到，之前的模型的参数已经加载新的模型对象中了，但是新的参数的require_grad属性都是True，这并不是我们想要的。

从上面的分析可以得到，我们用state_dict()读取模型的参数，并将其保存起来，然后加载到新的模型对象中，是达不到我们想要的效果的。我们还需要一些其他的操作，才可以完成目标。

我们可以以下两种方式来解决上面的问题：

require_grad=False

我们可以把不需要学的层的参数的require_grad属性设置为False

model_dict_ = model_.named_parameters()
for layer_name, layer_param in model_dict_:
    if 'fc1' in layer_name:
        continue
    else:
        layer_param.requires_grad = False

然后我们看一下模型的参数：

for layer_param in model_.parameters():
    print(layer_param)

输出：

Parameter containing:
tensor([[[[ 0.2438, -0.0467,  0.0486],
          [-0.1932, -0.2083,  0.3239],
          [ 0.1712,  0.0379, -0.2381]]],


        [[[ 0.2853,  0.0961,  0.0809],
          [ 0.2526,  0.3138, -0.2243],
          [-0.1627, -0.2958, -0.1995]]]])
Parameter containing:
tensor([-0.3287, -0.0686])
Parameter containing:
tensor([1., 1.])
Parameter containing:
tensor([0., 0.])
Parameter containing:
tensor([[ 0.0182,  0.1294,  0.0250, -0.1819, -0.2250, -0.2540, -0.2728,  0.2732],
        [ 0.0167, -0.0969,  0.1498, -0.1844,  0.1387,  0.2436,  0.1278, -0.1875],
        [-0.0408,  0.0786,  0.2352,  0.0277,  0.2571,  0.2782,  0.2505, -0.2454],
        [ 0.3369, -0.0804,  0.2677,  0.0927,  0.0433,  0.1716, -0.1870, -0.1738]],
       requires_grad=True)
Parameter containing:
tensor([0.1084, 0.3018, 0.1211, 0.1081], requires_grad=True)

我们可以看到不需要学习的层的参数的require_grad属性都变为了False。

然后再把这些参数送入优化器就可以：

optimizer = optim.SGD(model_.parameters(), lr=0.001, momentum=0.9)

设置优化器更新参数

如果不想让某一个网络层进行更新，比较简单的做法就是不把该网络层的参数放到优化器里面：

optimizer = optim.SGD(model_.fc1.parameters(), lr=0.001, momentum=0.9)

注：此刻被冻结的参数在进行反向传播时依旧进行求导，只是参数没有更新。

可以看到，如果采用该方法可以减少内存使用，同时如果提前使用require_grad=False会使得模型跳过不需要计算的参数，提高运算速度，所以可以将这两种方法结合在一起使用。

参考资料

PyTorch学习笔记：使用state_dict来保存和加载模型

python高级容器collections – OrderedDict

Pytorch 预训练模型加载、修改网络结构并固定某层参数训练、不同层采用不同的学习率

PyTorch保存部分模型参数，并在新的模型中加载

PyTorch保存部分模型参数，并在新的模型中加载

state_dict

state_dict简介

保存和加载state_dict

将模型保存到当前路径，名称为test_state_dict.pth

保存和加载完整模型

OrderedDict

保存部分模型参数，并在新的模型中加载

state_dict()、named_parameters()、model.parameter()、named_modules() 的区别

model.state_dict()

model.named_parameters()

model.parameter()

model.named_modules()

冻结某些层/只让某些层学习

require_grad=False

设置优化器更新参数

参考资料

猜你喜欢