【卷积】一维/二维、因果、空洞卷积

卷积可分为:

  • 一维卷积Conv1D
  • 二维卷积Conv2D

一维卷积主要用于时间卷积,而二维卷积可运用于图像卷积。从维度上也就很好理解,什么是一维什么是二维卷积了。因为时间是一维的,而图像则是二维的。

一维卷积其实可看作二维卷积的特例,即二维卷积时有一维的卷积核为1,如一个长条形的图像。实际上TensorflowLite也只实现了Conv2D,将Conv1D的运算统统转化为Conv2D。

同时一维卷积还存在二维卷积没有的因果卷积,即padding=causal

值得注意的是空洞卷积,也叫膨胀卷积、扩张卷积,即dilated Conv,即dialation_rate > 1。一维二维卷积均可进行空洞卷积。

因果卷积和空洞卷积在TCN中运用较多,即temporal convolution network。

1. 二维卷积 Conv2D

二维卷积在之前的博客中讲述过,可参考:https://blog.csdn.net/u010637291/article/details/112320280

1.1 输入参数

  • 卷积的输入参数:指需要做卷积的输入图像/音频等,它要求是一个Tensor,具有[batch, in_height, in_width, in_channels]这样的shape,具体图片的含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],注意这是一个4维的Tensor,要求类型为float32和float64其中之一

主要参数包括:

  • filters卷积核个数,也是输出通道数。Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).

  • kernel_size: 卷积核大小,指定二维卷积窗口的高和宽,(如果kernel_size只有一个整数,代表宽和高相等):An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

  • strides: 卷积步长,指定卷积窗沿高和宽方向的每次移动步长,An integer or tuple/list of 2 integers, (如果strides只有一个整数,代表沿着宽和高方向的步长相等) specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

  • padding: 为valid或same中一种, one of "valid" or "same" (case-insensitive). 两种padding方式的区别如下:

    • same mode

      当filter的中心(K)与image的边角重合时,开始做卷积运算。注意:这里的same还有一个意思,卷积之后输出的feature map尺寸保持不变(相对于输入图片)。当然,same模式不代表完全输入输出尺寸一样,也跟卷积核的步长有关系。same模式也是最常见的模式,因为这种模式可以在前向传播的过程中让特征图的大小保持不变,调参师不需要精准计算其尺寸变化(因为尺寸根本就没变化)。
      same

    • valid mode

      即不进行padding:filter全部在image里面进行卷积运算,可见filter的移动范围较same变小了。
      在这里插入图片描述

1.2 输出维数

ref: https://www.cnblogs.com/sddai/p/10512784.html

由上一小节可知,卷积层的padding方式不同,其输出维数也会不同。在此,分为padding=valid和padding=same两种情况进行说明:

1.2.1 padding=valid

给定输入参数:

  • inputs=[batch_size, in_height, in_width, in_channels],
  • filters,
  • kernel_size=[k_h, k_w],
  • stride_size=[s_h, s_w]
  • padding=‘valid’

则输出参数为:

output = [batch_size, out_height, out_width, out_channels]

# 其中:
out_channels = filters
out_height = ceil((in_height - k_h + 1) / s_h)
out_width = ceil((in_width - k_w + 1) / s_w) 

1.2.2 padding=same

给定输入参数:

  • inputs=[batch_size, in_height, in_width, in_channels],
  • filters,
  • kernel_size=[k_h, k_w],
  • stride_size=[s_h, s_w]
  • padding=‘same’

则输出参数为:

output = [batch_size, out_height, out_width, out_channels]

# 其中:
out_channels = filters
out_height = ceil(in_height / s_h)
out_width = ceil(in_width / s_w) 

1.1. 定义

在此附上源码定义

class Conv2D(Conv):
  """2D convolution layer (e.g. spatial convolution over images).

  This layer creates a convolution kernel that is convolved
  with the layer input to produce a tensor of
  outputs. If `use_bias` is True,
  a bias vector is created and added to the outputs. Finally, if
  `activation` is not `None`, it is applied to the outputs as well.

  When using this layer as the first layer in a model,
  provide the keyword argument `input_shape`
  (tuple of integers, does not include the sample axis),
  e.g. `input_shape=(128, 128, 3)` for 128x128 RGB pictures
  in `data_format="channels_last"`.

  Arguments:
    filters: Integer, the dimensionality of the output space (i.e. the number of
      output filters in the convolution).
    kernel_size: An integer or tuple/list of 2 integers, specifying the height
      and width of the 2D convolution window. Can be a single integer to specify
      the same value for all spatial dimensions.
    strides: An integer or tuple/list of 2 integers, specifying the strides of
      the convolution along the height and width. Can be a single integer to
      specify the same value for all spatial dimensions. Specifying any stride
      value != 1 is incompatible with specifying any `dilation_rate` value != 1.
    padding: one of `"valid"` or `"same"` (case-insensitive).
      `"valid"` means no padding. `"same"` results in padding evenly to
      the left/right or up/down of the input such that output has the same
      height/width dimension as the input.
    data_format: A string, one of `channels_last` (default) or `channels_first`.
      The ordering of the dimensions in the inputs. `channels_last` corresponds
      to inputs with shape `(batch_size, height, width, channels)` while
      `channels_first` corresponds to inputs with shape `(batch_size, channels,
      height, width)`. It defaults to the `image_data_format` value found in
      your Keras config file at `~/.keras/keras.json`. If you never set it, then
      it will be `channels_last`.
    dilation_rate: an integer or tuple/list of 2 integers, specifying the
      dilation rate to use for dilated convolution. Can be a single integer to
      specify the same value for all spatial dimensions. Currently, specifying
      any `dilation_rate` value != 1 is incompatible with specifying any stride
      value != 1.
    groups: A positive integer specifying the number of groups in which the
      input is split along the channel axis. Each group is convolved separately
      with `filters / groups` filters. The output is the concatenation of all
      the `groups` results along the channel axis. Input channels and `filters`
      must both be divisible by `groups`.
    activation: Activation function to use. If you don't specify anything, no
      activation is applied (see `keras.activations`).
    use_bias: Boolean, whether the layer uses a bias vector.
    kernel_initializer: Initializer for the `kernel` weights matrix (see
      `keras.initializers`).
    bias_initializer: Initializer for the bias vector (see
      `keras.initializers`).
    kernel_regularizer: Regularizer function applied to the `kernel` weights
      matrix (see `keras.regularizers`).
    bias_regularizer: Regularizer function applied to the bias vector (see
      `keras.regularizers`).
    activity_regularizer: Regularizer function applied to the output of the
      layer (its "activation") (see `keras.regularizers`).
    kernel_constraint: Constraint function applied to the kernel matrix (see
      `keras.constraints`).
    bias_constraint: Constraint function applied to the bias vector (see
      `keras.constraints`).
  Input shape:
    4+D tensor with shape: `batch_shape + (channels, rows, cols)` if
      `data_format='channels_first'`
    or 4+D tensor with shape: `batch_shape + (rows, cols, channels)` if
      `data_format='channels_last'`.
  Output shape:
    4+D tensor with shape: `batch_shape + (filters, new_rows, new_cols)` if
    `data_format='channels_first'` or 4+D tensor with shape: `batch_shape +
      (new_rows, new_cols, filters)` if `data_format='channels_last'`.  `rows`
      and `cols` values might have changed due to padding.

  Returns:
    A tensor of rank 4+ representing
    `activation(conv2d(inputs, kernel) + bias)`.

  Raises:
    ValueError: if `padding` is `"causal"`.
    ValueError: when both `strides > 1` and `dilation_rate > 1`.
  """

  def __init__(self,
               filters,
               kernel_size,
               strides=(1, 1),
               padding='valid',
               data_format=None,
               dilation_rate=(1, 1),
               groups=1,
               activation=None,
               use_bias=True,
               kernel_initializer='glorot_uniform',
               bias_initializer='zeros',
               kernel_regularizer=None,
               bias_regularizer=None,
               activity_regularizer=None,
               kernel_constraint=None,
               bias_constraint=None,
               **kwargs):
    super(Conv2D, self).__init__(
        rank=2,
        filters=filters,
        kernel_size=kernel_size,
        strides=strides,
        padding=padding,
        data_format=data_format,
        dilation_rate=dilation_rate,
        groups=groups,
        activation=activations.get(activation),
        use_bias=use_bias,
        kernel_initializer=initializers.get(kernel_initializer),
        bias_initializer=initializers.get(bias_initializer),
        kernel_regularizer=regularizers.get(kernel_regularizer),
        bias_regularizer=regularizers.get(bias_regularizer),
        activity_regularizer=regularizers.get(activity_regularizer),
        kernel_constraint=constraints.get(kernel_constraint),
        bias_constraint=constraints.get(bias_constraint),
        **kwargs)

1.2 使用示例

## tensorflow2.x 

def cnn_model():
    from tensorflow import keras

    inputs = keras.Input(shape=(34, 13, 1))
    net = keras.layers.Conv2D(filters=128, kernel_size=(10, 4), strides=(3, 2), padding='VALID', name='conv_1')(inputs)
    net = keras.layers.BatchNormalization()(net)
    outputs = keras.layers.ReLU()(net)

    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(
        loss=keras.losses.categorical_crossentropy,
        optimizer=keras.optimizers.Adadelta(),
        metrics=['accuracy']
    )

    model.summary()
    keras.utils.plot_model(model, 'cnn.png', show_shapes=True)

    return model

在这里插入图片描述

在这里插入图片描述

2. 一维卷积Conv1D

2.1 定义

其实和二维卷积变化不大:

  • 输入变为三维变量
  • padding此时包括:same, validcausal

其输入输出的维度计算与二维卷积相同。

@keras_export('keras.layers.Conv1D', 'keras.layers.Convolution1D')
class Conv1D(Conv):
  """1D convolution layer (e.g. temporal convolution).

  This layer creates a convolution kernel that is convolved
  with the layer input over a single spatial (or temporal) dimension
  to produce a tensor of outputs.
  If `use_bias` is True, a bias vector is created and added to the outputs.
  Finally, if `activation` is not `None`,
  it is applied to the outputs as well.

  When using this layer as the first layer in a model,
  provide an `input_shape` argument
  (tuple of integers or `None`, e.g.
  `(10, 128)` for sequences of 10 vectors of 128-dimensional vectors,
  or `(None, 128)` for variable-length sequences of 128-dimensional vectors.

  Arguments:
    filters: Integer, the dimensionality of the output space
      (i.e. the number of output filters in the convolution).
    kernel_size: An integer or tuple/list of a single integer,
      specifying the length of the 1D convolution window.
    strides: An integer or tuple/list of a single integer,
      specifying the stride length of the convolution.
      Specifying any stride value != 1 is incompatible with specifying
      any `dilation_rate` value != 1.
    padding: One of `"valid"`, `"same"` or `"causal"` (case-insensitive).
      `"valid"` means no padding. `"same"` results in padding evenly to
      the left/right or up/down of the input such that output has the same
      height/width dimension as the input.
      `"causal"` results in causal (dilated) convolutions, e.g. `output[t]`
      does not depend on `input[t+1:]`. Useful when modeling temporal data
      where the model should not violate the temporal order.
      See [WaveNet: A Generative Model for Raw Audio, section
        2.1](https://arxiv.org/abs/1609.03499).
    data_format: A string,
      one of `channels_last` (default) or `channels_first`.
    dilation_rate: an integer or tuple/list of a single integer, specifying
      the dilation rate to use for dilated convolution.
      Currently, specifying any `dilation_rate` value != 1 is
      incompatible with specifying any `strides` value != 1.
    groups: A positive integer specifying the number of groups in which the
      input is split along the channel axis. Each group is convolved
      separately with `filters / groups` filters. The output is the
      concatenation of all the `groups` results along the channel axis.
      Input channels and `filters` must both be divisible by `groups`.
    activation: Activation function to use.
      If you don't specify anything, no activation is applied (
      see `keras.activations`).
    use_bias: Boolean, whether the layer uses a bias vector.
    kernel_initializer: Initializer for the `kernel` weights matrix (
      see `keras.initializers`).
    bias_initializer: Initializer for the bias vector (
      see `keras.initializers`).
    kernel_regularizer: Regularizer function applied to
      the `kernel` weights matrix (see `keras.regularizers`).
    bias_regularizer: Regularizer function applied to the bias vector (
      see `keras.regularizers`).
    activity_regularizer: Regularizer function applied to
      the output of the layer (its "activation") (
      see `keras.regularizers`).
    kernel_constraint: Constraint function applied to the kernel matrix (
      see `keras.constraints`).
    bias_constraint: Constraint function applied to the bias vector (
      see `keras.constraints`).

  Input shape:
    3+D tensor with shape: `batch_shape + (steps, input_dim)`

  Output shape:
    3+D tensor with shape: `batch_shape + (new_steps, filters)`
      `steps` value might have changed due to padding or strides.

  Returns:
    A tensor of rank 3 representing
    `activation(conv1d(inputs, kernel) + bias)`.

  Raises:
    ValueError: when both `strides > 1` and `dilation_rate > 1`.
  """

  def __init__(self,
               filters,
               kernel_size,
               strides=1,
               padding='valid',
               data_format='channels_last',
               dilation_rate=1,
               groups=1,
               activation=None,
               use_bias=True,
               kernel_initializer='glorot_uniform',
               bias_initializer='zeros',
               kernel_regularizer=None,
               bias_regularizer=None,
               activity_regularizer=None,
               kernel_constraint=None,
               bias_constraint=None,
               **kwargs):
    super(Conv1D, self).__init__(
        rank=1,
        filters=filters,
        kernel_size=kernel_size,
        strides=strides,
        padding=padding,
        data_format=data_format,
        dilation_rate=dilation_rate,
        groups=groups,
        activation=activations.get(activation),
        use_bias=use_bias,
        kernel_initializer=initializers.get(kernel_initializer),
        bias_initializer=initializers.get(bias_initializer),
        kernel_regularizer=regularizers.get(kernel_regularizer),
        bias_regularizer=regularizers.get(bias_regularizer),
        activity_regularizer=regularizers.get(activity_regularizer),
        kernel_constraint=constraints.get(kernel_constraint),
        bias_constraint=constraints.get(bias_constraint),
        **kwargs)

2.2 使用示例

输入为三维变量,padding=valid(不进行padding):

def cnn_model():
    from tensorflow import keras

    inputs = keras.Input(shape=(34, 1))
    net = keras.layers.Conv1D(filters=128, kernel_size=(10), strides=(3), padding='VALID', name='conv_1')(inputs)
    net = keras.layers.BatchNormalization()(net)
    outputs = keras.layers.ReLU()(net)

    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(
        loss=keras.losses.categorical_crossentropy,
        optimizer=keras.optimizers.Adadelta(),
        metrics=['accuracy']
    )

    model.summary()
    keras.utils.plot_model(model, 'cnn.png', show_shapes=True)

    return model

在这里插入图片描述

其实,输入为四维变量,也能进行一维卷积:

def cnn_model():
    from tensorflow import keras

    inputs = keras.Input(shape=(34, 13, 1))
    net = keras.layers.Conv1D(filters=128, kernel_size=(10), strides=(3), padding='VALID', name='conv_1')(inputs)
    net = keras.layers.BatchNormalization()(net)
    outputs = keras.layers.ReLU()(net)

    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(
        loss=keras.losses.categorical_crossentropy,
        optimizer=keras.optimizers.Adadelta(),
        metrics=['accuracy']
    )

    model.summary()
    keras.utils.plot_model(model, 'cnn.png', show_shapes=True)

    return model

在这里插入图片描述

3. 一维因果卷积 Conv1D padding=causal

因为要处理序列问题(即要考虑时间问题)就不能使用普通的CNN卷积,必须使用新的CNN模型,这个就是因果卷积的作用,

对序列问题(sequence modeling),主要抽象为,根据x1…xt和y1…yt-1去预测yt,使得yt接近于实际值

在这里插入图片描述

但是存在问题是,如果考虑很久之前的变量x,那么卷积层数就必须增加(自行体会)。。。

卷积层数的增加就带来:梯度消失,训练复杂,拟合效果不好的问题,为了决绝这个问题,出现了扩展卷积(dilated)

3.1 使用示例

使用因果卷积时,即设置padding=causal。注意因果卷积只适用于Conv1D和SeperableConv1D。

def cnn_model():
    from tensorflow import keras

    inputs = keras.Input(shape=(34, 1))
    net = keras.layers.Conv1D(filters=128, kernel_size=(10), strides=(3), padding='causal', name='conv_1')(inputs)
    net = keras.layers.BatchNormalization()(net)
    outputs = keras.layers.ReLU()(net)

    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(
        loss=keras.losses.categorical_crossentropy,
        optimizer=keras.optimizers.Adadelta(),
        metrics=['accuracy']
    )

    model.summary()
    keras.utils.plot_model(model, 'cnn.png', show_shapes=True)

    return model

在这里插入图片描述
TODO:至于怎么算出输出为12维,我也不知道

4. 空洞卷积

ref: https://www.zhihu.com/question/54149221

Dilated Conv, 也叫扩展卷积、膨胀卷积、空洞卷积。

对于因果卷积,存在的一个问题是需要很多层或者很大的filter来增加卷积的感受野。

空洞卷积(dilated convolution)是通过跳过部分输入来使filter可以应用于大于filter本身长度的区域。等同于通过增加零来从原始filter中生成更大的filter。

在这里插入图片描述
dilated的好处是不做pooling损失信息的情况下,加大了感受野,让每个卷积输出都包含较大范围的信息。在图像需要全局信息或者语音文本需要较长的sequence信息依赖的问题中,都能很好的应用dilated conv,比如图像分割、语音合成WaveNet、机器翻译ByteNet中.

4.1 空洞因果卷积 Dilated causal Conv

在这里插入图片描述

使用因果空洞卷积时,即设置padding=causal, dilated_rate > 1。注意因果卷积只适用于Conv1D和SeperableConv1D,所以Dilated causal conv也只适用于这两个,无法适用二维卷积。

4.2 使用示例

def cnn_model():
    from tensorflow import keras

    inputs = keras.Input(shape=(34, 1))
    net = keras.layers.Conv1D(filters=128, kernel_size=(10), strides=(1), padding='causal', dilation_rate=3, name='conv_1')(inputs)
    net = keras.layers.BatchNormalization()(net)
    outputs = keras.layers.ReLU()(net)

    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(
        loss=keras.losses.categorical_crossentropy,
        optimizer=keras.optimizers.Adadelta(),
        metrics=['accuracy']
    )

    model.summary()
    keras.utils.plot_model(model, 'cnn.png', show_shapes=True)

    return model

在这里插入图片描述

注意,当dilated_rate > 1时,strides需等于1,否则会出现错误:

  • ValueError: strides > 1 not supported in conjunction with dilation_rate > 1

而不管dilated_value等于何值,输出维度都是34。TODO:具体推理我也不知道

猜你喜欢

转载自blog.csdn.net/u010637291/article/details/113377576