[Model] pruning Network Slimming: convolution kernel pruning

paper:Learning Efficient Convolutional Networks through Network Slimming

This is a do pruning paper, 2017 of ICCV article.

Bright spot is used as the main parameter gamma BN layer pruning assessed. Do pruning in the convolution kernel level, very easy to use.

problem:

In this paper, Lamda selection criteria and threshold criteria is not given, it should be is out of the experimental test.

Therefore, lamda can not get it through the network learning, this is actually more practical.


Idea

Look Batch Norm (BN): 

image

image

The figure is a schematic view of pruning can be seen that pruning be carried out in the convolution kernel level.

Network Pruning in the general case is calculated by adding some additional layers of the original network, the importance of information obtained convolution layer.

Network Slimming: compare clever idea, an evaluation of essentiality by BN layer gamma parameters. (Each channel has a gamma value, the authors believe gamma value reflects the importance of the channel).

Pruning process is the small value of the corresponding convolution subtract out.

As shown above, the orange corresponding parts are pruned.

Therefore, it was very important that the gamma parameter selection process!


gamma parameters

image

The upper panel shows the distribution of the parameter gamma.

A parameter for the gamma FIG no limit were scattered, prune direct impact on network performance.

Figures 2 and 3 show, respectively, the case of using a different lambda for the L1 norm. Constraints situation is different.

image ,g(s) = |s|

On the BN layer is increased gamma loss function parameters in L1 regularization. Right on behalf of Lamda added value of heavy losses.

After the completion of pruning to do fine-tune. Iteration. Fine-tuning process is as follows:

Prune

A schematic view of the entire fine tune follows:

image

Specific steps are as follows:

1, into initial network

2, by revising the training function loss, the threshold value thresh after training to a certain accuracy and the parameters calculated from the ratio of gamma to pruning.

3, will be lower than pruning thresh the convolution kernel.

4, fine-tune certain epoch, so that map and other parameters increase.

Step 234 is repeated, so that the network to achieve better performance

 

Experiment

Experimental results on CIFAR-10 and CIFAR-100 in the classification.

image

Compression and acceleration effect more obvious

image

Existing problems

In addition to learning the previously mentioned lambda parameters, I feel there is so little problem:

For pruning convolution kernel, in fact, it can be very effective in reducing the amount of computation. But when the hardware chip we want to achieve, it should be going to help them.

So there is no block of the module, how to design and application, is currently a problem.

 

Published 49 original articles · won praise 41 · views 30000 +

Guess you like

Origin blog.csdn.net/DL_wly/article/details/100142380