【deep_thoughts】48_快速复现PyTorch的Weight Normalization


视频链接: 48、快速复现PyTorch的Weight Normalization_哔哩哔哩_bilibili

官方API:torch.nn.utils.weight_norm

原始论文:Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

理论

  1. 在强化学习、对抗生成网络中会使用权重归一化来让训练更加稳定。

  2. torch.nn.utils.weight_norm(module, name=‘weight’, dim=0)

    pytorch官方API中的weight_norm是一个函数,而不是类。需要传入的是module。

  3. 计算公式如下:
    w = g v ∥ v ∥ \mathbf{w}=g \frac{\mathbf{v}}{\left \| \mathbf{v} \right \|} w=gvv
    其中, w \mathbf{w} w 是module的权重参数, g g g w \mathbf{w} w 的幅度值,也就是模。 v \mathbf{v} v 其实就是 w \mathbf{w} w v ∥ v ∥ \frac{\mathbf{v}}{\left \| \mathbf{v} \right \|} vv 计算的是单位长度的方向向量。这种操作更像是权重矩阵的分解。

  4. module中不加weight norm,是只需要优化一个参数;加了weight norm,变成同时优化两个参数,对 g g g v \mathbf{v} v 分别求一个梯度。

  5. weight norm并没有实质意义上的额外参数,输出值保持不变。

代码

全连接层

​ 首先使用全连接层作为module。

import torch
import torch.nn as nn

batch_size = 2
feat_dim = 3
hid_dim = 4
inputx = torch.randn(batch_size, feat_dim)  # 二维张量 [2,3]
linear = nn.Linear(feat_dim, hid_dim, bias=False)  # [4,3]
wn_linear = torch.nn.utils.weight_norm(linear)  # 官方API

​ 计算linear层的权重矩阵的幅度值和单位方向向量。计算每一行向量的模,可以通过L2范数得到。

weight_magnitude = torch.tensor([linear.weight[i, :].norm() for i in torch.arange(linear.weight.shape[0])], dtype=torch.float32).unsqueeze(-1)

weight_direction = linear.weight / weight_magnitude 
# 公式中的v就是w: v / ||v|| = w / ||w|| 

​ 这里直接将一些必要信息打印一下:

print("weight_magnitude:")  # 相当于公式中的g
print(weight_magnitude)

print("weight_direction:")  # 相当于公式中的 v / ||v||
print(weight_direction)

print("magnitude of weight_direction:")
print((weight_direction ** 2).sum(dim=-1))  # 每一行元素的平方和为1

​ 输出结果如下:

weight_magnitude:
tensor([[0.3865],
        [0.6001],
        [0.4221],
        [0.7440]])  # [4,1]
weight_direction:
tensor([[ 0.7945,  0.1528,  0.5877],
        [-0.9337,  0.3558, -0.0405],
        [ 0.8495,  0.0206, -0.5273],
        [-0.7468,  0.6474,  0.1521]], grad_fn=<DivBackward0>)  # [4,3]
magnitude of weight_direction:
tensor([1.0000, 1.0000, 1.0000, 1.0000], grad_fn=<SumBackward1>)  # [4]

​ 1.验证一下公式的正确性,即Linear.weight=weight_direction * weight_magnitude

print("linear.weight:")
print(linear.weight)
print("weight_direction * weight_magnitude:")
print(weight_direction * weight_magnitude)  

​ 输出结果相同,成功验证公式:

linear.weight:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], grad_fn=<MulBackward0>)  # [4,3]
weight_direction * weight_magnitude:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], grad_fn=<MulBackward0>)  # [4,1]*[4,3]->[4,3]

​ 2.验证另一个结论:权重归一化后的module不会改变原始module的输出结果

print("linear(inputx):")  # linear 和 wn_linear 的输出值相同 
print(linear(inputx))

print("wn_linear(inputx):")
print(wn_linear(inputx))

linearwn_linear的输出结果相同,说明验证成功。

linear(inputx):
tensor([[ 0.2138,  0.3498, -0.6853,  0.6026],
        [ 0.2718,  0.2176, -0.5267,  0.4888]], grad_fn=<MmBackward0>)  # [2,4]
wn_linear(inputx):
tensor([[ 0.2138,  0.3498, -0.6853,  0.6026],
        [ 0.2718,  0.2176, -0.5267,  0.4888]], grad_fn=<MmBackward0>)

​ 3.打印权重归一化后的全连接层的参数:

print("parameters of wn_linear:")
for n, p in wn_linear.named_parameters():
    print(n, p)

​ 输出结果如下:

parameters of wn_linear:
weight_g Parameter containing:
tensor([[0.3865],
        [0.6001],
        [0.4221],
        [0.7440]], requires_grad=True)  # [4,1]
weight_v Parameter containing:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], requires_grad=True)  # [4,3]

​ 可以看出wn_linear包含两个参数weight_gwieght_v。其中,weight_g与前面计算出的weight_magnitude相同,weight_vlinear_weight相同,即公式中的v就是w

​ 4.使用权重归一化线性层wn_linear的参数,根据公式计算出原始linear的权重linear.weight

print("construct weight of linear:")
print(wn_linear.weight_g * (wn_linear.weight_v / torch.tensor([wn_linear.weight_v[i, :].norm() for i in torch.arange(wn_linear.weight_v.shape[0])], dtype=torch.float32).unsqueeze(-1)))

​ 输出结果如下:

construct weight of linear:
tensor([[ 0.3071,  0.0591,  0.2272],
        [-0.5603,  0.2135, -0.0243],
        [ 0.3585,  0.0087, -0.2225],
        [-0.5556,  0.4817,  0.1132]], grad_fn=<MulBackward0>)

​ 可以看出结果与原始线性层权重linear.weight值相同。

卷积层

​ 使用一维卷积层作为module。

# 实例化1*1的卷积层,相当于MLP
conv1d = nn.Conv1d(feat_dim, hid_dim, kernel_size=1, bias=False)  # 1*1的卷积层
wn_conv1d = torch.nn.utils.weight_norm(conv1d)

​ 计算conv1d层的权重矩阵的幅度值和单位长度的方向向量。计算每一行向量的模,可以通过L2范数得到。

conv1d_weight_magnitude = torch.tensor([conv1d.weight[i, :, :].norm() for i in torch.arange(conv1d.weight.shape[0])],                             dtype=torch.float32).reshape(conv1d.weight.shape[0], 1, 1)
conv1d_weight_direction = conv1d.weight / conv1d_weight_magnitude

​ 将这些信息打印一下:

print("conv1d_weight_magnitude:")
print(conv1d_weight_magnitude)

print("conv1d_weight_direction:")
print(conv1d_weight_direction)

​ 输出结果如下:

conv1d_weight_magnitude:
tensor([[[0.8938]],

        [[0.5470]],

        [[0.5421]],

        [[0.5670]]]) 
conv1d_weight_direction:
tensor([[[ 0.6186],
         [-0.4646],
         [ 0.6336]],

        [[ 0.0478],
         [ 0.2542],
         [-0.9660]],

        [[-0.9090],
         [ 0.1691],
         [ 0.3808]],

        [[-0.0041],
         [-0.7952],
         [-0.6063]]], grad_fn=<DivBackward0>)    

​ 1.验证一下公式的正确性,即conv1d.weight=conv1d_weight_direction * conv1d_weight_magnitude

print("conv1d.weight:")
print(conv1d.weight)
print("conv1d_weight_magnitude * conv1d_weight_direction:")
print(conv1d_weight_magnitude * conv1d_weight_direction)

​ 输出结果相同,成功验证公式。

conv1d.weight:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], grad_fn=<MulBackward0>)
conv1d_weight_magnitude * conv1d_weight_direction:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], grad_fn=<MulBackward0>)

​ 2.打印出权重归一化后的一维卷积层wn_conv1d的参数:

print("parameter of wn_conv1d:")
for n, p in wn_conv1d.named_parameters():
    print(n, p, p.shape)

​ 输出结果为:

parameter of wn_conv1d:
weight_g Parameter containing:
tensor([[[0.8938]],

        [[0.5470]],

        [[0.5421]],

        [[0.5670]]], requires_grad=True) torch.Size([4, 1, 1])
weight_v Parameter containing:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], requires_grad=True) torch.Size([4, 3, 1])

wn_conv1d也包含两个参数weight_gweight_v。其中,weight_g与前面计算出的conv1d_weight_magnitude相同,weight_vconv1d_weight相同,即公式中的v就是w

​ 3.使用权重归一化线性层wn_conv1d的参数,根据公式计算出原始conv1d的权重conv1d.weight

print("construct weight of conv1d:")
print(wn_conv1d.weight_g * (wn_conv1d.weight_v / torch.tensor([wn_conv1d.weight_v[i, :, :].norm() for i in torch.arange(wn_linear.weight_v.shape[0])]).reshape(wn_linear.weight_v.shape[0], 1, 1)))

​ 输出结果与conv1d.weight相同:

construct weight of conv1d:
tensor([[[ 0.5529],
         [-0.4153],
         [ 0.5663]],

        [[ 0.0262],
         [ 0.1390],
         [-0.5284]],

        [[-0.4928],
         [ 0.0917],
         [ 0.2064]],

        [[-0.0023],
         [-0.4509],
         [-0.3438]]], grad_fn=<MulBackward0>)

猜你喜欢

转载自blog.csdn.net/qq_45670134/article/details/131289695