论文阅读|Learning to Measure Changes: Fully Convolutional Siamese Metric Networks for Scene Change Detection

原文标题:Learning to Measure Changes: Fully Convolutional Siamese Metric Networks for Scene Change Detection

Papers Link: http://arxiv.org/abs/1810.09111

abstract

Scene change detection by the scene of the difficulties of shading, shadows, and different camera view and the noise caused by changes difficult to measure, because of changes in noise and semantic change are intertwined. From the viewpoint of the most intuitive, characteristic differences in the direct comparison of FIG. Use comparison features to reduce losses from the change does not change and increase the distance of the feature, also proposed a threshold comparison of the loss of a significant change in point of view to solve the problems caused. Source: https://github.com/gmayday1997/ChangeDet

introduction

state-of-the-art methods are substantially based FCN, fcn model based on the detected change by learning the best decision boundary with the detection results. In order to distinguish between noise and edges semantic change, one possible approach is to provide a measurable difference in index changes, a greater value semantic change, a smaller value of generated noise variance. The core idea is to reduce the depth measure of learning in the class differences, class differences between increases. Thesis contains two parts: a feature extraction with two predefined distance function evaluation of the distance characteristic.

The main contribution: a proposed to solve the various problems of the first frame structure; two proposed threshold comparison loss Thresholded Contrastive Loss (TCL) to overcome a significant change viewpoint; three, to the state-of-the-art; Fourth, the distance metric based fcn integrated into the baseline.

The most conventional way is by directly to a threshold pixel value of a significant difference image calculated low cost, but poor distinction. There is a manual method of design features, such as image rationing, change vector analysis, Markov random field, and dictionarylearning. state-of-the-art methods based on FCN, are studying the boundary decision to CD. Author's idea is based on another paper, this paper measure changes in the distance, but the distinction is not enough. Change detection based on deep siamese convolutional network for optical aerialimages this paper and this paper is very similar, but the authors suggest that a end-to-end approach to solving a variety of problems.

proposed approach

As the basic frame, extracts the feature on, then the Euclidean distance or cosine similarity metric as the distance, the extraction of the feature and this distance measure is called a unified process the full convolution by implicitly metric twin network. To optimize the use of contrast loss changed pair has a larger distance value, Unchanged pair have a smaller distance values, loss of use of contrast threshold solve significant viewpoints change.

It will not change as a similarity, in a measure of dissimilarity function. This function consists of two parts: a feature descriptor and a distance metric. Feature descriptor is actually obtained through the network characterized in twin, backbone network or may be used Googlenet DeepLab can. Distance metric is the author of the design threshold comparison function loss, and the Euclidean distance metric and cosine similarity measure to do the experiment compared.

The figure is ContrastiveLoss using a Euclidean distance, \ (y_i, J =. 1 \) indicates no change in this position, \ (D (F_i, F_j) \) represents \ (F_i \) and \ (F_j \) eigenvectors Euclidean distance, \ (m \) is the maximum distance

FIG CosLoss utilizes the cosine similarity, \ (D_k \) is the cosine similarity, \ (W_k \) and \ (b_k \) is learned zoom and pan parameters

There is ineffective and slow convergence of the above disadvantages of loss function, the authors believe there is a contradiction: on the one hand, due to the large viewing angle changes lead to activation of additional information is not relevant, there is no change in those areas will be considered to have changes, there is change information does not change intertwined; on the other hand, had no change in the difference region is generated due to the change in the viewpoint, so when a distance corresponding to the optimization of the training will be reduced to as 0, there is a decreasing trend, and this trend does produce the results we want. But the key problem is that this trend can not be reduced to the semantic distance characteristics is reduced to 0, the authors proposed TCL loss

This definition implies that it is not necessary to minimize the distance to zero, tolerance to a distance metric. In order to prove the effectiveness of this loss, the author made a comparative experiment on the CD2014

Training strategy employs MultiLayer Side-Output (MLSO) method, this method is based on two points of view: (1) transferred to the intermediate gradient layer backpropagation may disappear, which leads to the intermediate layer does not have the characteristic ability to distinguish; (2) features an upper capacity is dependent on the ability to distinguish the features of the intermediate layer.

In the drawing layer shown in each output characteristics of calculated feature distance, with ground truth demand \ (loss_h \) , and then calculates a final loss according to the formula \ (Loss \) , \ (\ beta_h \) is the corresponding weight . In the prediction stage, for the different layers use different confidence threshold, outputting a final prediction result is obtained by averaging the individual layers.

experimental and discuss

Data collection: VL-CMU-CD Dataset PCD2015 Dataset CDnet Dataset Evaluation ON CDnet

1.MLSO training strategy can really enhance the effect; 2 performance better than the Euclidean distance effect cosine similarity

Comparing the third set of data, the authors show the way to achieve a competitive, but there is insufficient number of indicators. One is state-of-the-art method using a method of dividing a certain degree on the changed semantic task. My understanding is that because the semantic segmentation method is good at distinguishing foreground and background, just split semantic segmentation in the foreground goals we need, in fact, the network may not be known in a block area in the end there is no change, but can be trained divided moving target and, therefore, not affected by the change of perspective. On the other hand, the author's method is actually a method of image difference, there is a certain gap with the inevitable semantic segmentation methods in terms of accuracy, semantic segmentation itself is a pixel-level classification of the problem.

discussion

We discussed three issues: (1) network model is proposed to change the perspective of large robust? If (2) the performance of the model is sensitive to the threshold? If (3) the use of contrast enhanced measure learning loss method really learn to distinguish the ability to have more features?

For the first question, change the perspective of small and large viewing angles. The use of TCL loss function, \ (Thread = 0 \) is the loss of contrast, the best 0.1

For the second question, the above model already know susceptible threshold, requiring changes in contrast and background to maximize the prospects for change, for a different distance function was compared using the mean square value RMS contrast, it should be the characteristics of demand after the image distance mean square value

1570427791917

The results show that the image generated by the Euclidean distance having a large contrast ratio, and therefore more able to distinguish between background Euclidean distance variation; deep rich features with semantic information, a strong ability to distinguish derived robust features from the figure the better

For the third question, a measure of the loss of contrast methods of learning used FCN, using cross entropy loss \ (Loss_class \) and contrast loss \ (Loss_feat \) , the results show that there is a small lift

1570427831773

Guess you like

Origin www.cnblogs.com/QuintinLiu/p/11752790.html