Leveraging Filter Correlations for Deep Model Compression paper notes

Paper address: http://arxiv.org/abs/1811.10559
github address: None

This article was published in 2018 and proposed a model compression method based on filter correlation. Its characteristic is that after the correlation coefficient is established, the correlation of the filter with strong correlation is further enhanced, thereby reducing the loss of the filter after pruning.

Motivation

The previous pruning methods based on importance indicators did not fully consider the redundancy between filters, but only considered the importance of filters, so there are important but redundant filters that will not be pruned, so they cannot compress redundancy. optimality. Therefore, this paper proposes a pruning method based on filter correlation to improve network compression rate.

Methods

This paper proposes an iterative pruning process based on the filter correlation coefficient. Its frame diagram is shown in the figure:
Frames
Specific process: In the Episode Selection stage, the correlation coefficient of any two filters is calculated layer by layer, and the correlation coefficient is used as the importance index of each pair of filters. The closer the correlation coefficient is to 0, the more important it is, linear correlation The lower the degree; the more it tends to ±1, the less important it is, and the higher the linear correlation between filters. Each layer extracts the N least important filter pairs to form the episode S t S_t of the t-th iterative pruningSt. In the Optimization stage, a regularization term is added to the loss function.
loss function
C ( θ ) C(\theta)C ( θ ) is the loss function,CS t C_{S_t}CStis a regularization item, used to improve S t S_tStThe correlation coefficient of the filter pair in the middle, so as to reduce the information loss after cutting off one filter of each pair of filters. CS t C_{S_t}CStThe expression is as follows:
correlation coefficient
where ρ XY \rho_{XY}rXYFor CS t C_{S_t}CStThe Pearson coefficient of the filter pair in the middle can be deduced properly CS t C_{S_t}CStWhen it decreases, the correlation coefficient of the index item increases, that is, to achieve the effect of improving the correlation.
Discard N Filters: This stage starts from CS t C_{S_t}CStSelect the first N unimportant filter pairs, cut off any one of each pair, and get the pruned network.

Afterwards, fine-tuning is performed, and the above-mentioned process is repeatedly cycled within the maximum allowable error range to achieve the maximum compression.

Experiment

In this paper, the effectiveness of the method is tested from two tasks of classification and detection. For classification models, LeNet-5, VGG-16, ResNet-50, 56 are used; for detection, Faster-RCNN and SSD networks are used.
GPU: GTX 1080 Ti
λ \lambdaλ (hyperparameter before regularization): 1

Results

result 1
result 2
result 3
result 4
result 5

Thoughts

I think this article is a supplement to the pruning method based on the importance index, because most pruning algorithms calculate their importance, whether it is the simplest L1, L2Norm or the recent HRank, they do not calculate the importance index Considering that important filters may be highly correlated is another kind of redundancy. Therefore, this method can be combined with other pruning methods for secondary pruning, which may achieve higher compression ratios.

Guess you like

Origin blog.csdn.net/qq_43812519/article/details/105033181