paddlepaddle 22 基于hook实现既插既用的dropout

1、paddle中支持的hook

通过对paddle中api的查询,发现paddle的layer对象只在前向传播中支持两种hook,pre_hookpost_hookpre_hook可以对层的输入变量进行处理,用函数的返回值作为新的变量参与层的计算。post_hook则可以对层的输出变量进行处理,将层的输出进行进一步处理后,用函数的返回值作为层计算的输出。paddle的tensor对象只支持反向传播hook,可以用于获取梯度值或返回新的梯度值。

1.1 layer中forward_pre_hook的使用

作用在layer的前向传播前,可以用于修改或抓取输入数据,使用hook1=layer.register_forward_pre_hook(func)注册hook对象,使用hook1.remove卸载hook

def forward_pre_hook(layer, input):
    print(input)
    return input

x = paddle.ones([10, 1], 'float32')
model = Model()
forward_pre_hook_handle = model.flatten.register_forward_pre_hook(forward_pre_hook)
out = model(x)

1.2 layer中forwar_post_hook的使用

作用在layer的前向传播后,可以用于修改或抓取输出数据,使用hook1=layer.register_forward_post_hook(func)注册hook对象,使用hook1.remove卸载hook

def forward_post_hook(layer, input, output):
    return 2*output

x = paddle.ones([10, 1], 'float32')
model = Model()
forward_post_hook_handle = model.flatten.register_forward_post_hook(forward_post_hook)
out = model(x)
print(out)
forward_post_hook_handle.remove()

输出:Tensor(shape=[10, 1], dtype=float32, place=CPUPlace, stop_gradient=True,
       [[2.],
        [2.],
        ...

1.3 tensor中register_hook的使用

tensor.register_hook(func)为当前 Tensor 注册一个反向的 hook 函数。该被注册的 hook 函数将会在每次当前 Tensor 的梯度 Tensor 计算完成时被调用。被注册的 hook 函数不会修改输入的梯度 Tensor ,但是 hook 可以返回一个新的临时梯度 Tensor 代替当前 Tensor 的梯度继续进行反向传播。使用示例如下所示

import paddle

# hook function return None
def print_hook_fn(grad):
    print(grad)

# hook function return Tensor
def double_hook_fn(grad):
    grad = grad * 2
    return grad

x = paddle.to_tensor([0., 1., 2., 3.], stop_gradient=False)
y = paddle.to_tensor([4., 5., 6., 7.], stop_gradient=False)
z = paddle.to_tensor([1., 2., 3., 4.])

# one Tensor can register multiple hooks
h = x.register_hook(print_hook_fn)
x.register_hook(double_hook_fn)

w = x + y
# register hook by lambda function
w.register_hook(lambda grad: grad * 2)

o = z.matmul(w)
o.backward()
# print_hook_fn print content in backward
# Tensor(shape=[4], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
#        [2., 4., 6., 8.])

print("w.grad:", w.grad) # w.grad: [1. 2. 3. 4.]
print("x.grad:", x.grad) # x.grad: [ 4.  8. 12. 16.]
print("y.grad:", y.grad) # y.grad: [2. 4. 6. 8.]

# remove hook
h.remove()

2、实现即插即用的dropout

2.1 DropHookModel的实现

通过DropHookModel的封装,可以实现对任意网络模型的任意位置添加dropout、dropblock等模块,并保证在train时生效,在eval时不起作用。

import paddle
import paddle.nn as nn
class DropHookModel(nn.Layer):
    def __init__(self,model,hook_layer,hook_func,hook_type):
        super(DropHookModel, self).__init__()
        self.model = model
        self.hook_layer = hook_layer
        self.hook_func = hook_func
        self.hook_type = hook_type
        self.hooks=[]
        self.train()
    #在train里进行hook绑定
    def train(self):
        self.model.train()
        for layer,func,type_ in zip(self.hook_layer,self.hook_func,self.hook_type):
            if type_=="forward_pre_hook":
                hook=layer.register_forward_pre_hook(func)
            elif type_=="forwar_post_hook":
                hook=layer.register_forward_post_hook(func)
            else:
                raise("type_ must is one of ['forward_pre_hook','forwar_post_hook']")
            self.hooks.append(hook)
    #在eval里取消绑定
    def eval(self):
        self.model.eval()
        for hook in self.hooks:
            hook.remove()
            
    #用于前向传播        
    def forward(self,x):
        return self.model.forward(x)
    
    #返回用于训练的参数
    def parameters(self):
        return self.model.parameters()

    #返回用于训练的参数
    def named_parameters(self):
        return self.model.parameters()

    #用于设置模型参数
    def set_state_dict(self,model_dict):
        self.model.set_state_dict(model_dict)

    #用于返回模型参数
    def state_dict(self):
        return self.model.state_dict()

        
#创建hook函数1
dropout2d_1=nn.Dropout2D(p=0.3)
dropout2d_1.train()
def farward_hook_1(module, input, output):
    output=dropout2d_1(output)
    return output
 
#创建hook函数2
dropout2d_2=nn.Dropout2D(p=0.3)
dropout2d_2.train()
def farward_hook_2(module, input, output):
    output=dropout2d_2(output)
    return output

model=paddle.vision.resnet18()
#设置要绑定的layer对象
hook_layer=[model.layer4,model.layer3]
#设置要绑定的hook函数
hook_func=[farward_hook_1,farward_hook_2]
#设置要绑定的hook类型
hook_type=["forwar_post_hook","forwar_post_hook"]
#继续hook函数的绑定
drop_model=DropHookModel(model,hook_layer,hook_func,hook_type)

2.2 DropHookModel的使用

通过观察代码及执行输出,可以看到train模式下,每次的输出都不一样(dropout2d生效),而eval模式下,每次的输出都一样(dropout2d失效)

test_data=paddle.rand(shape=(1,3,224,224))
print("drop_model in train mode:")
drop_model.train()
for i in range(5):
    out=drop_model(test_data)
    print(out.numpy().argmax(axis=1),out.numpy().max(axis=1))

print("\ndrop_model in eval mode:")
drop_model.eval()
for i in range(5):
    out=drop_model(test_data)
    print(out.numpy().argmax(axis=1),out.numpy().max(axis=1))
代码执行输出如下所示
drop_model in train mode:
[873] [2.8585076]
[909] [2.964711]
[369] [3.7595296]
[145] [3.1055984]
[391] [3.1836662]

drop_model in eval mode:
[919] [4.50915]
[919] [4.50915]
[919] [4.50915]
[919] [4.50915]
[919] [4.50915]

2.3 DropHookModel的训练

通过博主对train、eval、forward、parameters4个函数的封装,使得DropHookModel对象可以跟正常的模型一样使用与训练。

# 设置优化器
optim = paddle.optimizer.Adam(parameters=drop_model.parameters())
# 设置损失函数  其中内置了softmax和onehot
loss_fn = paddle.nn.CrossEntropyLoss()    
x_data = paddle.rand(shape=(10,3,224,224))
y_data = paddle.randint(low=0, high=100, shape=[10,1])
for i in range(10):
    predicts=drop_model(x_data)
    loss = loss_fn(predicts, y_data)
    # 计算准确率 等价于 prepare 中metrics的设置
    acc = paddle.metric.accuracy(predicts, y_data)
    # 反向传播
    loss.backward()
    # 更新参数
    optim.step()
    # 梯度清零
    optim.clear_grad()
    print(i,loss.item())
代码执行输出如下所示:
0 12.406988143920898
1 6.672011852264404
2 6.359235763549805
3 6.101187229156494
4 5.717696666717529
5 5.145468711853027
6 4.492755889892578
7 4.362451076507568
8 3.6308979988098145
9 3.2801218032836914

猜你喜欢

转载自blog.csdn.net/a486259/article/details/123965321