论文阅读笔记:SPPNet

1. SPPNet

He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.

现有的卷积神经网络总是需要一个特定尺寸的图片作为输入,例如常用的 224 × 224 224 \times 224 224×224。假设存在一些不满足这种尺寸的原始图片,那么需要对图片进行一些预处理,例如裁剪拉伸。这一系列的人工处理会影响神经网络的预测精度,因此为了解决这一问题,能够使得神经网络模型可以接受任意输入尺寸的图片,本文提出了Spatial pyramid pooling

为什么CNN模型需要特定尺寸的输入呢,这来源于模型最后的线性分类器,分类器需要对经过CNN处理的feature map做flatten操作,这就需要知道最后CNN输出的feature map的形状以及通道数。其实可以通过全局池化来做,因为全局池化后的feature map的形状是 C × 1 × 1 C \times 1 \times 1 C×1×1。但是全局池化会损失一定的精度,相当于使用了一个和feature map尺寸相等的kernel做了max pool或者avg pool。

本文提出的SPPNet为了保留池化时feature map的精度,在分类器之前加入多尺度的池化,然后将多尺度池化后的结果展平拼接,最终可以得到一个固定尺寸的特征向量。
在这里插入图片描述
假设最后一个卷积输出的图片尺寸为 C × H × W C \times H \times W C×H×W,我们采用16倍,4倍,1倍尺度进行采样,最终我们可以得到 ( 16 + 4 + 1 )   t i m e s C (16 + 4 + 1) \ times C (16+4+1) timesC的特征向量。
在这里插入图片描述

2. 代码实现

class SPPNet(nn.Module):

    def __init__(self, in_channels, levels=None):
        super(SPPNet, self).__init__()
        if levels is None:
            self.levels = [6, 3, 2, 1]
        else:
            self.levels = levels

    def forward(self, x):
        # x [batch_size, C, H, W]
        H, W = x.shape[2], x.shape[3]
        ret = []
        for i in range(len(self.levels)):

            h_kernel = int(math.ceil(H / self.levels[i]))
            w_kernel = int(math.ceil(W / self.levels[i]))
            h_pad = int(math.ceil((h_kernel * self.levels[i] - H) / 2))
            w_pad = int(math.ceil((w_kernel * self.levels[i] - W) / 2))
            maxpool = nn.MaxPool2d(kernel_size=(h_kernel, w_kernel),
                                   stride=(h_kernel, w_kernel),
                                   padding=(h_pad, w_pad))
            ret.append(torch.flatten(maxpool(x), start_dim=2))
        return torch.flatten(torch.cat(ret, dim=-1), start_dim=1)

我们把这个模块嵌入到自定义的卷积模型中:

class ConvNet(nn.Module):

    def __init__(self, num_classes=10, levels=None):
        super(ConvNet, self).__init__()
        if levels is None:
            levels = [6, 3, 2, 1]
        classifier_in = torch.sum(torch.tensor(levels) ** 2)
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3,
                               stride=2, padding=1)
        self.bn1 = nn.BatchNorm2d(num_features=64)
        self.relu1 = nn.ReLU(inplace=True)

        self.conv2 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3,
                               stride=2, padding=1)
        self.bn2 = nn.BatchNorm2d(num_features=128)
        self.relu2 = nn.ReLU(inplace=True)

        self.conv3 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3,
                               stride=2, padding=1)
        self.bn3 = nn.BatchNorm2d(num_features=256)
        self.relu3 = nn.ReLU(inplace=True)

        self.conv4 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3,
                               stride=1, padding=1)
        self.bn4 = nn.BatchNorm2d(num_features=512)
        self.relu4 = nn.ReLU(inplace=True)

        self.spp = SPPNet(in_channels=512, levels=levels)
        self.relu5 = nn.ReLU(inplace=True)

        self.classifier = nn.Linear(in_features=classifier_in * 512, out_features=num_classes)
        self._init_params()

    def _init_params(self):
        for name, module in self.named_modules():
            if isinstance(module, nn.Conv2d):
                nn.init.kaiming_normal_(module.weight)

    def forward(self, x):
        x = self.relu1(self.bn1(self.conv1(x)))
        x = self.relu2(self.bn2(self.conv2(x)))
        x = self.relu3(self.bn3(self.conv3(x)))
        x = self.relu4(self.bn4(self.conv4(x)))
        x = self.relu5(self.spp(x))
        x = self.classifier(x)
        return x

猜你喜欢

转载自blog.csdn.net/loki2018/article/details/125296334