Self-Attention Mechanism in Convolutional Neural Networks

Self-Attention Mechanism in Convolutional Neural Networks

Self-Attention Mechanism in Convolutional Neural Networks.

The self-attention mechanism in the convolutional neural network is represented by a non-local filtering operation, and its implementation process is similar to the self-attention mechanism of the Seq2Seq model .

The standard convolutional layer is a local filtering operation, where any position on the output feature is constructed from a neighborhood corresponding to the input feature, and can only capture the relationship between local features. The self-attention mechanism directly captures long-range dependencies by calculating the relationship between any two positions, without being limited to adjacent points, which is equivalent to constructing a convolution kernel as large as the size of the feature map , so that more information can be captured. .

In the self-attention mechanism of the convolutional network, first construct the key feature $f(x)$ of the input feature $x$, the query feature $g(x)$ and the value feature $h(x)$; then apply the dot product Attention constructs a self-attention feature map:

$$ \alpha_{i} = \text{softmax}\left(f(x_i)^Tg(x_j)\right) =\frac{e^{f(x_i)^Tg(x_j)}}{\sum_j e^{f(x_i)^Tg(x_j)}} $$

The weighting of all input value features $h(x_j)$ is considered when computing the response $y_i$ at the output location $i$:

Guess you like

Origin blog.csdn.net/universsky2015/article/details/131672296