CNN文本分类模型构建(torch版)

参数声明

V:词向量个数

D:词向量维度

C:分类个数

Co:卷积核个数

Ks:卷积核不同大小的列表,代码中为[3,4,5]

函数定义

定义计算CNN第i层神经元个数和第i+1层神经元个数的函数:def calculate_fan_in_and_fan_out(tensor)

 1 def calculate_fan_in_and_fan_out(tensor):
 2         dimensions = tensor.ndimension()
 3         if dimensions < 2:
 4             raise ValueError("Fan in and fan out can not be computed for tensor with less than 2 dimensions")
 5 
 6         if dimensions == 2:  # Linear
 7             fan_in = tensor.size(1)
 8             fan_out = tensor.size(0)
 9         else:
10             num_input_fmaps = tensor.size(1)
11             num_output_fmaps = tensor.size(0)
12             receptive_field_size = 1
13             if tensor.dim() > 2:
14                 receptive_field_size = tensor[0][0].numel()
15             fan_in = num_input_fmaps * receptive_field_size
16             fan_out = num_output_fmaps * receptive_field_size
17 
18         return fan_in, fan_out
View Code

定义CNN_Text类,并且用它继承nn.Module,在类中还需要重写nn.Module中的forward函数(即前向传播函数),待所有变量运算声明过后在最后重写forward,先在构造函数中完成对模型参数构建的代码。

词嵌入

1 self.embed = nn.Embedding(V, D, max_norm=2, scale_grad_by_freq=True, padding_idx=args.paddingId)
View Code

其中max_norm定义了每个向量的最大均值,如果生成的词向量均值大于max_norm,则重新进行以max_norm为均值的normalization。给定参数值之后,embed的size和embed.weight.data存储内容如下图所示:

如果有预训练好的词模型,词向量存储在张量pretrained_weight中,则用它取代embed.weight.data:

1 self.embed.weight.data.copy_(pretrained_weight)
View Code

定义宽卷积CNN

对输入的二维矩阵进行padding操作之后用三个不同卷积核大小的CNN分别卷积,定义CNN如下(参数见文首声明),用200个卷积核,一次分别卷积3,4,5个词,输入通道数是1,输出通道数则为200:

1 self.convs1 = [nn.Conv2d(in_channels=Ci, out_channels=Co, kernel_size=(K, D), stride=(1, 1), padding=(K//2, 0), dilation=1, bias=False) for K in Ks]
View Code

再对定义好的三个不同卷积核大小的CNN中的权重进行初始化,并得出权重的fanin和fanout

1         for conv in self.convs1:
2                 init.xavier_normal(conv.weight.data, gain=np.sqrt(args.init_weight_value))
3                 fan_in, fan_out = CNN_Text.calculate_fan_in_and_fan_out(conv.weight.data)
4                 print(" in {} out {} ".format(fan_in, fan_out))
View Code

Dropout

防止过拟合,因此有:

1 self.dropout = nn.Dropout(args.dropout)
2 self.dropout_embed = nn.Dropout(args.dropout_embed)
View Code

全连接层

所以在CNN最后的全连接层中,所有输入的特征数量为3*200(卷积核种类乘以卷积核个数),输出则是要分类的类数,因此定义全连接层如下:

1 self.fc = nn.Linear(in_features=in_fea, out_features=C, bias=True)
View Code

Batch Normalizations

在每一个网络层后可进行BN处理,这样做的好处可参考http://blog.csdn.net/hjimce/article/details/50866313

BN层定义如下:

1 self.convs1_bn = nn.BatchNorm2d(num_features=Co, momentum=args.bath_norm_momentum,#默认momentum为0.1
2                                            affine=args.batch_norm_affine)#affine默认为false
3 self.fc1_bn = nn.BatchNorm1d(num_features=in_fea//2, momentum=args.bath_norm_momentum, affine=args.batch_norm_affine)
4 self.fc2_bn = nn.BatchNorm1d(num_features=C,momentum=args.bath_norm_momentum, affine=args.batch_norm_affine)
View Code

至此构造函数所有内容设计完毕,最后在子类CNN中重写父类nn.Module中的forward函数,即前向传播函数

 1 def forward(self, x):#F为torch.nn.Functional
 2         x = self.embed(x)  # (N,W,D)N句话,W个词,D维词向量,在每句话后面都补0了
 3         x = self.dropout_embed(x)#在tensor X中随机赋0
 4         x = x.unsqueeze(1)  # (N,Ci,W,D)输入通道是单通道,加了个1维通道的概念
 5         print(x)
 6         if self.args.batch_normalizations is True:
 7             x = [self.convs1_bn(F.tanh(conv(x))).squeeze(3) for conv in self.convs1] #[(N,Co,W), ...]*len(Ks)
 8             x = [F.max_pool1d(i, i.size(2)).squeeze(2) for i in x] #[(N,Co), ...]*len(Ks)
 9         else:
10             x = [F.relu(conv(x)).squeeze(3) for conv in self.convs1] #[(N,Co,W), ...]*len(Ks)
11             x = [F.max_pool1d(i, i.size(2)).squeeze(2) for i in x] #[(N,Co), ...]*len(Ks)
12         x = torch.cat(x, 1)#按维数为1进行拼接(维数0则竖着接按行接,维数1则横着接按列接)
13         x = self.dropout(x)  # (N,len(Ks)*Co)
14         if self.args.batch_normalizations is True:
15             x = self.fc1_bn(self.fc1(x))
16             logit = self.fc2_bn(self.fc2(F.tanh(x)))
17         else:
18             logit = self.fc(x)
19         return logit
View Code

至此文本分类CNN中所有内容构造完毕(注:代码源自GitHub高星经典代码)

猜你喜欢

转载自www.cnblogs.com/hytnchen/p/10246614.html