【ICLR2018】《SparsityWinogradCNN》

一、Introduction

1、想要结合两种方法：Winograd 和 pruning。

2、CNN的乘法数量：AlexNet 1.1x10^9； -> 1.6x10^10。

3、文章的两个思路：把ReLU操作移到Winograd变换的后面；在变换之后进行pruning。

二、Related Work

卷积的线性性质：Cong & Xiao (2014) 将卷积变换成矩阵乘法，用线性代数的方法，减少了约47%的乘法。Lavin (2015)第一次使用Winograd方法，减少了2.25x - 4x的乘法。cuDNN也有使用Winograd算法。

模型压缩：Han et al. (2015; 2016b)； Liu et al. (2017) first proposed pruning and re-training the weights in Winograd domain for conventional Winograd convolution。Sparsity 90% in the Winograd parameters of AlexNet with less than 0:1% accuracy loss.

动态激活层稀疏化：Han（2016）通过test阶段设置一个小的正阈值，可以达到不损失测试精度的效果，同时进一步稀疏化。

三、Winograd算法的应用

正常的二维Winograd公式：

Spatial Baseline Network: 对应于上图的（a）

Winograd Native Pruned Network:

Winograd-domain pruned network introduced by Liu et al. (2017) and Li et al. (2017), 所以b图也是别人提出来的。

本文提出的改进方案WINOGRAD-RELU CNN：

实施的三个要点：Dense training, Pruning, Re-training

Dense training.	直接训练变换后的卷积核。通过使用BP下降其逆变换。
Pruning	为不同的层设置不同的裁剪率
Re-training	using a “sparsity mask" to force the weights that were pruned to remain zero.

所以BP算法在逆变换下的训练公式为：

四、实验结果

Dataset :

CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009) and ImageNet2012 (Russakovsky et al., 2015)

Network architectures:

VGG-nagadomi (Nagadomi, 2014), ConvPool-CNN-C model (Springenberg et al., 2014) and a variation of ResNet-18 (He et al., 2016a)

使用TensorFlow框架，从头训练三种完整的结构，然后进行剪枝在训练。

CIFAR10

使用了VGG-nagadomi，一个轻量版本的VGG，CIFAR10上的最高精度93.31%。加入Winograd算法，训练精度为：

Winograd CNN -> 93.30%

Winograd-ReLU CNN -> 93.43%

设置第一层剪枝固定保留80%，其他层逐步从80%降到20%。

精度掉>0.1%，就是significantly。前两种是60%的位置，本文的模型到40%。

前两种减少conv层的 workload 5.1x 和 3.7x。本文的减少了13.3x。

CNN整体workload减少2.2x和3.0x (跟另外两种方法比)

CIFAR100

ConvPool-CNN-C (Springenberg et al., 2014) model

模型精度分别是：69.34% 69.32% 69.75%

三种模型精度都可以在60%密度下保持。

ImageNet

used a variation of the full pre-activation version (He et al., 2016b) of ResNet-18 (He et al., 2016a)

精度TOP-1/5： 66:67%/87:42%

66:84%/87:47%

66:78%/87:43%

workload提升：

5.1x

4.5x

13.2x

五、kernel的可视化

【ICLR2018】《SparsityWinogradCNN》

猜你喜欢