关于FCN特征融合时剪裁大小的计算
这是一个FCN-16s的pytorch代码部分:
self.conv1_1 = nn.Conv2d(3, 64, 3, padding=100)
self.relu1_1 = nn.ReLU(inplace=True)
self.conv1_2 = nn.Conv2d(64, 64, 3, padding=1)
self.relu1_2 = nn.ReLU(inplace=True)
self.pool1 = nn.MaxPool2d(2, stride=2, ceil_mode=True) # 1/2
# conv2
self.conv2_1 = nn.Conv2d(64, 128, 3, padding=1)
self.relu2_1 = nn.ReLU(inplace=True)
self.conv2_2 = nn.Conv2d(128, 128, 3, padding=1)
self.relu2_2 = nn.ReLU(inplace=True)
self.pool2 = nn.MaxPool2d(2, stride=2, ceil_mode=True) # 1/4
# conv3
self.conv3_1 = nn.Conv2d(128, 256, 3, padding=1)
self.relu3_1 = nn.ReLU(inplace=True)
self.conv3_2 = nn.Conv2d(256, 256, 3, padding=1)
self.relu3_2 = nn.ReLU(inplace=True)
self.conv3_3 = nn.Conv2d(256, 256, 3, padding=1)
self.relu3_3 = nn.ReLU(inplace=True)
self.pool3 = nn.MaxPool2d(2, stride=2, ceil_mode=True) # 1/8
# conv4
self.conv4_1 = nn.Conv2d(256, 512, 3, padding=1)
self.relu4_1 = nn.ReLU(inplace=True)
self.conv4_2 = nn.Conv2d(512, 512, 3, padding=1)
self.relu4_2 = nn.ReLU(inplace=True)
self.conv4_3 = nn.Conv2d(512, 512, 3, padding=1)
self.relu4_3 = nn.ReLU(inplace=True)
self.pool4 = nn.MaxPool2d(2, stride=2, ceil_mode=True) # 1/16
# conv5
self.conv5_1 = nn.Conv2d(512, 512, 3, padding=1)
self.relu5_1 = nn.ReLU(inplace=True)
self.conv5_2 = nn.Conv2d(512, 512, 3, padding=1)
self.relu5_2 = nn.ReLU(inplace=True)
self.conv5_3 = nn.Conv2d(512, 512, 3, padding=1)
self.relu5_3 = nn.ReLU(inplace=True)
self.pool5 = nn.MaxPool2d(2, stride=2, ceil_mode=True) # 1/32
# fc6
self.fc6 = nn.Conv2d(512, 4096, 7)
self.relu6 = nn.ReLU(inplace=True)
self.drop6 = nn.Dropout2d()
# fc7
self.fc7 = nn.Conv2d(4096, 4096, 1)
self.relu7 = nn.ReLU(inplace=True)
self.drop7 = nn.Dropout2d()
self.score_fr = nn.Conv2d(4096, n_class, 1)
self.score_pool4 = nn.Conv2d(512, n_class, 1)
self.upscore2 = nn.ConvTranspose2d(
n_class, n_class, 4, stride=2, bias=False)
self.upscore16 = nn.ConvTranspose2d(
n_class, n_class, 32, stride=16, bias=False)
FCN和VGG的区别在于把全连接层换成了卷积层:
# fc6
self.fc6 = nn.Conv2d(512, 4096, 7)
self.relu6 = nn.ReLU(inplace=True)
self.drop6 = nn.Dropout2d()
# fc7
self.fc7 = nn.Conv2d(4096, 4096, 1)
self.relu7 = nn.ReLU(inplace=True)
self.drop7 = nn.Dropout2d()
然后后面通过上采样和特征融合实现了heatmap:
self.score_fr = nn.Conv2d(4096, n_class, 1)
self.score_pool4 = nn.Conv2d(512, n_class, 1)
self.upscore2 = nn.ConvTranspose2d(
n_class, n_class, 4, stride=2, bias=False)
self.upscore16 = nn.ConvTranspose2d(
n_class, n_class, 32, stride=16, bias=False)
h = self.score_pool4(pool4)
h = h[:, :, 5:5 + upscore2.size()[2], 5:5 + upscore2.size()[3]]
score_pool4c = h # 1/16
h = upscore2 + score_pool4c
h = self.upscore16(h)
h = h[:, :, 27:27 + x.size()[2], 27:27 + x.size()[3]].contiguous()
这里以FCN-16s为例,计算一下剪裁大小:
可以看到,我们需要把和后的向量进行融合。
设,经过的操作以后,大小变为 。
依次经过每经过一次大小缩小,直到后,大小变为。
再经过Pooling5后,大小变为,继续经过7x7卷积层后,大小变为。
此时,我们进行上采样2倍,为了简化操作,考虑上采样no padding的情况:
根据公式(S是步长,K是kernel_size),S=2,K=4
则,经过计算得出,。
因为要实现特征融合,即进行如下操作:
代码实现如下:
h = upscore2 + score_pool4c
要实现相加操作,我们需要保证Size的一致性。但是,我们发现,与大小相差10,也就是上下左右各-5。所以,我们需要对的向量进行剪裁操作。实现如下:
h = h[:, :, 5:5 + upscore2.size()[2], 5:5 + upscore2.size()[3]]
由于在Pytorch中Conv2d输出是(N,C,H,W)的形式,故我们需要对向量的后两个维度,即H,W进行剪裁。
这样,我们的特征融合就可以继续了。