CBAM CBAM: Convolutional Block Attention Module

其他 2021-11-22 05:45:24 阅读次数: 0

#https://blog.csdn.net/u011699990/article/details/81276851?utm_medium=distribute.pc_relevant.none-task-blog-baidulandingword-4&spm=1001.2101.3001.4242
这个pytorch代码可以运行，没报错
https://zhuanlan.zhihu.com/p/106084464
讲解内容:::::链接：https://www.cnblogs.com/pprp/p/12127849.html
cbam虽然是在2018年提出的，但是其影响力比较深远，在很多领域都用到了该模块

什么是注意力机制？
注意力机制（Attention Mechanism）是机器学习中的一种数据处理方法，广泛应用在自然语言处理、图像识别及语音识别等各种不同类型的机器学习任务中。
通俗来讲：**注意力机制就是希望网络能够自动学出来图片或者文字序列中的需要注意的地方。**比如人眼在看一幅画的时候，不会将注意力平等地分配给画中的所有像素，而是将更多注意力分配给人们关注的地方。
从实现的角度来讲：注意力机制通过神经网络的操作生成一个掩码mask, mask上的值一个打分，评价当前需要关注的点的评分。
注意力机制可以分为：
通道注意力机制：对通道生成掩码mask，进行打分，代表是senet, Channel Attention Module
空间注意力机制：对空间进行掩码的生成，进行打分，代表是Spatial Attention Module
混合域注意力机制：同时对通道注意力和空间注意力进行评价打分，代表的有BAM, CBAM
怎么实现CBAM？(pytorch为例)
CBAM arxiv link: https://arxiv.org/pdf/1807.06521.pdf

CBAM全称是Convolutional Block Attention Module, 是在ECCV2018上发表的注意力机制代表作之一。本人在打比赛的时候遇见过有人使用过该模块取得了第一名的好成绩，证明了其有效性。

在该论文中，作者研究了网络架构中的注意力，注意力不仅要告诉我们重点关注哪里，还要提高关注点的表示。目标是通过使用注意机制来增加表现力，关注重要特征并抑制不必要的特征。为了强调空间和通道这两个维度上的有意义特征，作者依次应用通道和空间注意模块，来分别在通道和空间维度上学习关注什么、在哪里关注。此外，通过了解要强调或抑制的信息也有助于网络内的信息流动。

主要网络架构也很简单，一个是通道注意力模块，另一个是空间注意力模块，CBAM就是先后集成了通道注意力模块和空间注意力模块。
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
来自：https://blog.csdn.net/u013738531/article/details/82731257

是连续的两个操作步骤，为何要先使用通道注意力机制然后再使用空间注意力机制？使用顺序使用这两个模块还是并行的使用两个模块？其实是作者已经做过了相关实验，并且证明了先试用通道然后再使用空间注意力机制这样的组合效果比较好，这也是CBAM的通用组合模式。
在这里插入图片描述

现这样最近眼里有血丝。。不看电脑了我要休息几天了看看纸质书
最近试试代码能不能看懂，能不能加在别的网络上。参考了以下博客：
https://www.jianshu.com/p/3e33ab049b4e
https://www.jianshu.com/p/4fac94eaca91
https://blog.csdn.net/qq_14845119/article/details/81393127
https://blog.csdn.net/qq_43265072/article/details/106057548
https://blog.csdn.net/qq_43265072/article/details/106058693
https://zhuanlan.zhihu.com/p/99260231（这个试了一下，没成功）
代码：：：

'''''
#https://www.jianshu.com/p/4fac94eaca91
#分别进行MLP的操作并求和，等同于作者在对应维度上进行concate 在一起进行MLP,之后在对应维度上进行sum（）， 所以理论上应该是没毛病的
import tensorflow as tf
import numpy as np

slim = tf.contrib.slim

def combined_static_and_dynamic_shape(tensor):
#   Returns a list containing static and dynamic values for the dimensions.返回包含维度的静态和动态值的列表。
#   Returns a list of static and dynamic values for shape dimensions. This is   返回形状维度的静态和动态值列表。
#   useful to preserve static shapes when available in reshape operation.
# 当在“整形”操作中可用时，这对于保留静态形状非常有用。
#   Args:
#     tensor: A tensor of any type. 任何类型的张量。
# 
#   Returns:
#     A list of size tensor.shape.ndims containing integers or a scalar tensor.
# 一系列大小形状的ndims 包含整数或标量张量的。

  static_tensor_shape = tensor.shape.as_list()
  dynamic_tensor_sh
  ape = tf.shape(tensor)
  combined_shape = []
  for index, dim in enumerate(static_tensor_shape):
    if dim is not None:
      combined_shape.append(dim)
    else:
      combined_shape.append(dynamic_tensor_shape[index])
  return combined_shape

def convolutional_block_attention_module(feature_map, index, inner_units_ratio=0.5):
    
    # CBAM: convolution block attention module, which is described in "CBAM: Convolutional Block Attention Module"
    # Architecture : "https://arxiv.org/pdf/1807.06521.pdf"
    # If you want to use this module, just plug this module into your network
    # :param feature_map : input feature map
    # :param index : the index of convolution block attention module卷积块注意模块的索引
    # :param inner_units_ratio: output units number of fully connected layer输出单元全连通层数: inner_units_ratio*feature_map_channel
    # :"return:feature map with channel and spatial attention
    
    with tf.variable_scope("cbam_%s" % (index)):
        feature_map_shape = combined_static_and_dynamic_shape(feature_map)


        # channel attention
        channel_avg_weights = tf.nn.avg_pool(
            value=feature_map,
            ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],
            strides=[1, 1, 1, 1],
            padding='VALID'
        )
        channel_max_weights = tf.nn.max_pool(
            value=feature_map,
            ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],
            strides=[1, 1, 1, 1],
            padding='VALID'
        )
        channel_avg_reshape = tf.reshape(channel_avg_weights,
                                         [feature_map_shape[0], 1, feature_map_shape[3]])
        channel_max_reshape = tf.reshape(channel_max_weights,
                                         [feature_map_shape[0], 1, feature_map_shape[3]])
        channel_w_reshape = tf.concat([channel_avg_reshape, channel_max_reshape], axis=1)

        fc_1 = tf.layers.dense(
            inputs=channel_w_reshape,
            units=feature_map_shape[3] * inner_units_ratio,
            name="fc_1",
            activation=tf.nn.relu
        )
        fc_2 = tf.layers.dense(
            inputs=fc_1,
            units=feature_map_shape[3],
            name="fc_2",
            activation=None
        )
        channel_attention = tf.reduce_sum(fc_2, axis=1, name="channel_attention_sum")
        channel_attention = tf.nn.sigmoid(channel_attention, name="channel_attention_sum_sigmoid")
        channel_attention = tf.reshape(channel_attention, shape=[feature_map_shape[0], 1, 1, feature_map_shape[3]])
        feature_map_with_channel_attention = tf.multiply(feature_map, channel_attention)


        # spatial attention
        channel_wise_avg_pooling = tf.reduce_mean(feature_map_with_channel_attention, axis=3)
        channel_wise_max_pooling = tf.reduce_max(feature_map_with_channel_attention, axis=3)

        channel_wise_avg_pooling = tf.reshape(channel_wise_avg_pooling,
                                              shape=[feature_map_shape[0], feature_map_shape[1], feature_map_shape[2],
                                                     1])
        channel_wise_max_pooling = tf.reshape(channel_wise_max_pooling,
                                              shape=[feature_map_shape[0], feature_map_shape[1], feature_map_shape[2],
                                                     1])

        channel_wise_pooling = tf.concat([channel_wise_avg_pooling, channel_wise_max_pooling], axis=3)
        spatial_attention = slim.conv2d(
            channel_wise_pooling,
            1,
            [7, 7],
            padding='SAME',
            activation_fn=tf.nn.sigmoid,
            scope="spatial_attention_conv"
        )
        feature_map_with_attention = tf.multiply(feature_map_with_channel_attention, spatial_attention)
        return feature_map_with_attention



#example
feature_map = tf.constant(np.random.rand(2,8,8,32), dtype=tf.float16)
feature_map_with_attention = convolutional_block_attention_module(feature_map, 1)

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    result = sess.run(feature_map_with_attention)
    print(result.shape)
    
'''''











'''''
##https://blog.csdn.net/qq_14845119/article/details/81393127
def cbam_module(inputs, reduction_ratio=0.5, name=""):
    with tf.variable_scope("cbam_" + name, reuse=tf.AUTO_REUSE):
        batch_size, hidden_num = inputs.get_shape().as_list()[0], inputs.get_shape().as_list()[3]

        maxpool_channel = tf.reduce_max(tf.reduce_max(inputs, axis=1, keepdims=True), axis=2, keepdims=True)
        avgpool_channel = tf.reduce_mean(tf.reduce_mean(inputs, axis=1, keepdims=True), axis=2, keepdims=True)

        maxpool_channel = tf.layers.Flatten()(maxpool_channel)
        avgpool_channel = tf.layers.Flatten()(avgpool_channel)

        mlp_1_max = tf.layers.dense(inputs=maxpool_channel, units=int(hidden_num * reduction_ratio), name="mlp_1",
                                    reuse=None, activation=tf.nn.relu)
        mlp_2_max = tf.layers.dense(inputs=mlp_1_max, units=hidden_num, name="mlp_2", reuse=None)
        mlp_2_max = tf.reshape(mlp_2_max, [batch_size, 1, 1, hidden_num])

        mlp_1_avg = tf.layers.dense(inputs=avgpool_channel, units=int(hidden_num * reduction_ratio), name="mlp_1",
                                    reuse=True, activation=tf.nn.relu)
        mlp_2_avg = tf.layers.dense(inputs=mlp_1_avg, units=hidden_num, name="mlp_2", reuse=True)
        mlp_2_avg = tf.reshape(mlp_2_avg, [batch_size, 1, 1, hidden_num])

        channel_attention = tf.nn.sigmoid(mlp_2_max + mlp_2_avg)
        channel_refined_feature = inputs * channel_attention

        maxpool_spatial = tf.reduce_max(inputs, axis=3, keepdims=True)
        avgpool_spatial = tf.reduce_mean(inputs, axis=3, keepdims=True)
        max_avg_pool_spatial = tf.concat([maxpool_spatial, avgpool_spatial], axis=3)
        conv_layer = tf.layers.conv2d(inputs=max_avg_pool_spatial, filters=1, kernel_size=(7, 7), padding="same",
                                      activation=None)
        spatial_attention = tf.nn.sigmoid(conv_layer)

        refined_feature = channel_refined_feature * spatial_attention

    return refined_feature
'''''






'''''
#https://blog.csdn.net/qq_43265072/article/details/106057548
#https://blog.csdn.net/qq_43265072/article/details/106058693
#将平均和最大池化后得到得描述子分别输入MLP，从代码中也可以看到连个描述子得MLP权重共享。
def cbam_block(input_feature, index, reduction_ratio=8):
    with tf.variable_scope('cbam_%s' % index):
        attention_feature = channel_attention(input_feature, index, reduction_ratio)
        attention_feature = spatial_attention(attention_feature, index)
        print("hello CBAM")
    return attention_feature


def channel_attention(input_feature, index, reduction_ratio=8):
    kernel_initializer = tf.contrib.layers.variance_scaling_initializer()
    bias_initializer = tf.constant_initializer(value=0.0)

    with tf.variable_scope('ch_attention_%s' % index):
        feature_map_shape = input_feature.get_shape()
        channel = input_feature.get_shape()[-1]
        avg_pool = tf.nn.avg_pool(value=input_feature,
                                  ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],
                                  strides=[1, 1, 1, 1],
                                  padding='VALID')
        assert avg_pool.get_shape()[1:] == (1, 1, channel)
        avg_pool = tf.layers.dense(inputs=avg_pool,
                                   units=channel//reduction_ratio,
                                   activation=tf.nn.relu,
                                   kernel_initializer=kernel_initializer,
                                   bias_initializer=bias_initializer,
                                   name='mlp_0',
                                   reuse=None)
        assert avg_pool.get_shape()[1:] == (1, 1, channel//reduction_ratio)
        avg_pool = tf.layers.dense(inputs=avg_pool,
                                   units=channel,
                                   kernel_initializer=kernel_initializer,
                                   bias_initializer=bias_initializer,
                                   name='mlp_1',
                                   reuse=None)
        assert avg_pool.get_shape()[1:] == (1, 1, channel)

        max_pool = tf.nn.max_pool(value=input_feature,
                                  ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],
                                  strides=[1, 1, 1, 1],
                                  padding='VALID')
        assert max_pool.get_shape()[1:] == (1, 1, channel)
        max_pool = tf.layers.dense(inputs=max_pool,
                                   units=channel//reduction_ratio,
                                   activation=tf.nn.relu,
                                   name='mlp_0',
                                   reuse=True)
        assert max_pool.get_shape()[1:] == (1, 1, channel//reduction_ratio)
        max_pool = tf.layers.dense(inputs=max_pool,
                                   units=channel,
                                   name='mlp_1',
                                   reuse=True)
        assert max_pool.get_shape()[1:] == (1, 1, channel)
        scale = tf.nn.sigmoid(avg_pool + max_pool)
    return input_feature * scale


def spatial_attention(input_feature, index):
    kernel_size = 7
    kernel_initializer = tf.contrib.layers.variance_scaling_initializer()
    with tf.variable_scope("sp_attention_%s" % index):
        avg_pool = tf.reduce_mean(input_feature, axis=3, keepdims=True)
        assert avg_pool.get_shape()[-1] == 1
        max_pool = tf.reduce_max(input_feature, axis=3, keepdims=True)
        assert max_pool.get_shape()[-1] == 1
        # 按通道拼接
        concat = tf.concat([avg_pool, max_pool], axis=3)
        assert concat.get_shape()[-1] == 2

        concat = slim.conv2d(concat, num_outputs=1,
                             kernel_size=[kernel_size, kernel_size],
                             padding='SAME',
                             activation_fn=tf.nn.sigmoid,
                             weights_initializer=kernel_initializer,
                             scope='conv')
        assert concat.get_shape()[-1] == 1

    return input_feature * concat
'''''






















#https://zhuanlan.zhihu.com/p/99260231
#所需库
from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, Reshape, Dense, multiply, Permute, Concatenate, Conv2D, Add, Activation, Lambda
from keras import backend as K
from keras.activations import sigmoid
#通道注意力机制
def channel_attention(input_feature, ratio=8):
    channel_axis = 1 if K.image_data_format() == "channels_first" else -1
    channel = input_feature._keras_shape[channel_axis]

    shared_layer_one = Dense(channel // ratio,
                             kernel_initializer='he_normal',
                             activation='relu',
                             use_bias=True,
                             bias_initializer='zeros')

    shared_layer_two = Dense(channel,
                             kernel_initializer='he_normal',
                             use_bias=True,
                             bias_initializer='zeros')

    avg_pool = GlobalAveragePooling2D()(input_feature)
    avg_pool = Reshape((1, 1, channel))(avg_pool)
    assert avg_pool._keras_shape[1:] == (1, 1, channel)
    avg_pool = shared_layer_one(avg_pool)
    assert avg_pool._keras_shape[1:] == (1, 1, channel // ratio)
    avg_pool = shared_layer_two(avg_pool)
    assert avg_pool._keras_shape[1:] == (1, 1, channel)

    max_pool = GlobalMaxPooling2D()(input_feature)
    max_pool = Reshape((1, 1, channel))(max_pool)
    assert max_pool._keras_shape[1:] == (1, 1, channel)
    max_pool = shared_layer_one(max_pool)
    assert max_pool._keras_shape[1:] == (1, 1, channel // ratio)
    max_pool = shared_layer_two(max_pool)
    assert max_pool._keras_shape[1:] == (1, 1, channel)

    cbam_feature = Add()([avg_pool, max_pool])
    cbam_feature = Activation('hard_sigmoid')(cbam_feature)

    if K.image_data_format() == "channels_first":
        cbam_feature = Permute((3, 1, 2))(cbam_feature)

    return multiply([input_feature, cbam_feature])

#空间注意力机制
def spatial_attention(input_feature):
    kernel_size = 7
    if K.image_data_format() == "channels_first":
        channel = input_feature._keras_shape[1]
        cbam_feature = Permute((2, 3, 1))(input_feature)
    else:
        channel = input_feature._keras_shape[-1]
        cbam_feature = input_feature

    avg_pool = Lambda(lambda x: K.mean(x, axis=3, keepdims=True))(cbam_feature)
    assert avg_pool._keras_shape[-1] == 1
    max_pool = Lambda(lambda x: K.max(x, axis=3, keepdims=True))(cbam_feature)
    assert max_pool._keras_shape[-1] == 1
    concat = Concatenate(axis=3)([avg_pool, max_pool])
    assert concat._keras_shape[-1] == 2
    cbam_feature = Conv2D(filters=1,
                          kernel_size=kernel_size,
                          activation='hard_sigmoid',
                          strides=1,
                          padding='same',
                          kernel_initializer='he_normal',
                          use_bias=False)(concat)
    assert cbam_feature._keras_shape[-1] == 1

    if K.image_data_format() == "channels_first":
        cbam_feature = Permute((3, 1, 2))(cbam_feature)

    return multiply([input_feature, cbam_feature])

#构建CBAM
def cbam_block(cbam_feature, ratio=8):
    """Contains the implementation of Convolutional Block Attention Module(CBAM) block.
    As described in CBAM: Convolutional Block Attention Module.
    """

    cbam_feature = channel_attention(cbam_feature, ratio)
    cbam_feature = spatial_attention(cbam_feature, )
    return cbam_feature

#在相应的位置添加CBAM
inputs = x
residual = layers.Conv2D(filter, kernel_size = (1, 1), strides = strides, padding = 'same')(inputs)
residual = layers.BatchNormalization(axis = bn_axis)(residual)
cbam = cbam_block(residual)
x = layers.add([x, residual, cbam])










''''' 

#se-net实现
#https://blog.csdn.net/qq_38410428/article/details/87979417
from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, Reshape, Dense,
def build_model(nb_classes, input_shape=(256,256,3)):
    inputs_dim = Input(input_shape)
    x = InceptionV3(include_top=False,
                weights='imagenet',
                input_tensor=None,
                input_shape=(256, 256, 3),
                pooling=max)(inputs_dim)

    squeeze = GlobalAveragePooling2D()(x)

    excitation = Dense(units=2048 // 16)(squeeze)
    excitation = Activation('relu')(excitation)
    excitation = Dense(units=2048)(excitation)
    excitation = Activation('sigmoid')(excitation)
    excitation = Reshape((1, 1, 2048))(excitation)

    scale = multiply([x, excitation])

    x = GlobalAveragePooling2D()(scale)
    dp_1 = Dropout(0.3)(x)
    fc2 = Dense(nb_classes)(dp_1)
    fc2 = Activation('sigmoid')(fc2) #此处注意，为sigmoid函数
    model = Model(inputs=inputs_dim, outputs=fc2)
    return model

if __name__ == '__main__':
	model =build_model(nb_classes, input_shape=(im_size1, im_size2, channels))
	opt = Adam(lr=2*1e-5)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit()

'''''

近期补充说明：
https://blog.csdn.net/weixin_33602281/article/details/85223216
深度学习之卷积网络attention机制SENET、CBAM模块原理总结
这里讲解了senet架构，模型是如何压缩也给出了矩阵的理解方式很好代码可以不参考这个

https://blog.csdn.net/weixin_30904593/article/details/98551624?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-4.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-4.control
CBAM论文的翻译代码是torch 没详细注释不方便理解下篇参考理解看看如何插入模块

https://blog.csdn.net/qq_38410428/article/details/103694759?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control
理解如何插入CBAM模块

https://blog.csdn.net/paper_reader/article/details/81082351?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-5.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-5.control
计算机注意力机制的概述讲解了软注意力的注意力域
Spatial Transformer Networks（STN）模型[4]是15年NIPS上文章

一篇讲语义分割的综述博客：https://zhuanlan.zhihu.com/p/110123136

扫描二维码关注公众号，回复： 13291392 查看本文章