原文地址：https://nbviewer.jupyter.org/github/fchollet/deep-learning-with-python-notebooks/blob/master/5.4-visualizing-what-convnets-learn.ipynb

Visualizing what convnets learn

人们经常说深度学习模式是“黑盒子”，学习表示难以提取并以人类可读的形式呈现。虽然这对于某些类型的深度学习模型来说是部分正确的，但对于网点来说绝对是不正确的。通过小节点学习的表示非常适合可视化，很大程度上是因为它们是视觉概念的表示。自2013年以来，已经开发了多种技术来对这些表示进行可视化和解释。我们不会对所有这些进行调查，但我们将介绍其中最易于使用和最有用的三个：

可视化中间转换输出（“中间激活”）。这对理解连续的重叠网层如何转换它们的输入以及首先了解各个重叠网络过滤器的含义很有用。
可视化过滤器。这对于准确理解每个过滤器在可接受的范围内的视觉模式或概念很有用。
可视化图像中类激活的热图。这对了解识别为属于给定类的图像的哪一部分是有用的，并且因此允许图像中的对象本地化。

对于第一种方法 - 激活可视化 - 我们将使用我们在两节前从零开始训练猫与狗分类问题的小型小网络。对于接下来的两种方法，我们将使用我们在前一节介绍的VGG16模型。

可视化中间层

可视化中间激活包括在给定某个输入（层的输出通常称为“激活”，即激活函数的输出）的情况下，显示由网络中的各种卷积和积聚层输出的特征映射。这给出了如何将输入分解到网络学习的不同滤波器的视图。这些要显示的特征图有三个维度：宽度，高度和深度（通道）。每个通道都编码相对独立的特征，因此将这些特征图可视化的正确方法是将每个通道的内容独立绘制为2D图像。首先加载我们在第5.2节中保存的模型：

from keras.models import load_model

model = load_model('cats_and_dogs_small_2.h5')
model.summary()  # As a reminder.

这将是我们将使用的输入图像 - 猫的图片，而不是网络培训的图像的一部分：

img_path = '/Users/fchollet/Downloads/cats_and_dogs_small/test/cats/cat.1700.jpg'

# We preprocess the image into a 4D tensor
from keras.preprocessing import image
import numpy as np

img = image.load_img(img_path, target_size=(150, 150))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
# Remember that the model was trained on inputs
# that were preprocessed in the following way:
img_tensor /= 255.

# Its shape is (1, 150, 150, 3)
print(img_tensor.shape)

让我们来展示我们的图片：

import matplotlib.pyplot as plt

plt.imshow(img_tensor[0])
plt.show()

为了提取我们想要看的特征地图，我们将创建一个Keras模型，它将批量图像作为输入，并输出所有卷积和积分图层的激活。为此，我们将使用Keras类模型。模型使用两个参数实例化：输入张量（或输入张量列表）和输出张量（或输出张量列表）。得到的类是Keras模型，就像您熟悉的Sequential模型一样，将指定的输入映射到指定的输出。 Model类的不同之处在于它允许具有多个输出的模型，而不像Sequential。有关Model类的更多信息，请参见第7章第1节。

from keras import models

# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:8]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

提供图像输入时，此模型会返回原始模型中图层激活的值。这是您在本书中第一次遇到多输出模型：到目前为止，您所看到的模型只有一个输入和一个输出。在一般情况下，模型可以有任何数量的输入和输出。这一个有一个输入和8个输出，每层激活一个输出。

# This will return a list of 5 Numpy arrays:
# one array per layer activation
activations = activation_model.predict(img_tensor)

例如，这是我们的猫图像输入的第一个卷积层的激活：

first_layer_activation = activations[0]
print(first_layer_activation.shape)

这是一张带有32个频道的148x148功能地图。让我们试着想象第三个频道：

import matplotlib.pyplot as plt

plt.matshow(first_layer_activation[0, :, :, 3], cmap='viridis')
plt.show()

该通道似乎编码对角边缘检测器。让我们试试第30个频道 - 但请注意，您自己的频道可能会有所不同，因为卷积层学到的特定滤波器不是确定性的。

plt.matshow(first_layer_activation[0, :, :, 30], cmap='viridis')
plt.show()

这个看起来像一个“鲜绿色的点”检测器，对编码猫眼有用。在这一点上，让我们去绘制网络中所有激活的完整可视化图。我们将在我们的8个激活图中的每个激活图中提取并绘制每个通道，并将结果叠加在一个大图像张量中，并排叠加通道。

import keras

# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)

images_per_row = 16

# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # We will tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show()

    在这里需要注意一些非凡的事情：
    第一层充当各种边缘检测器的集合。在那个阶段，激活仍然保留了最初图像中的所有信息。
        随着我们越来越高级，激活变得越来越抽象，并且在视觉上可视化程度越来越低。他们开始编码更高层次的概念，如“猫耳朵”或“猫眼”。较高级的演示文件携带越来越少的关于图像视觉内容的信息，以及越来越多的与图像类别相关的信息。
        激活的稀疏性随着图层的深度而增加：在第一层中，所有滤镜都由输入图像激活，但在以下图层中，越来越多的滤镜是空白的。这意味着在输入图像中找不到滤镜编码的图案。
    我们刚刚证明了深度神经网络所表示的一个非常重要的通用特征：一层提取的特征随着层的深度越来越抽象。较高层的激活越来越少地传递关于所看到的特定输入的信息以及关于目标（在我们的例子中，图像的类别：猫或狗）的越来越多的信息。一个深度神经网络有效地作为一个信息蒸馏管道，原始数据进入（在我们的例子中为RBG图片），并进行反复变换，以便滤除不相关的信息（例如图像的特定视觉外观），同时提供有用的信息得到放大和细化（例如图像的类）。

这与人类和动物感知世界的方式类似：观察场景几秒钟后，人类可以记住哪些抽象物体存在于其中（例如自行车，树），但不记得这些物体的具体外观。事实上，如果你现在想从头脑中抽出一辆普通的自行车，即使在你有生之年见过成千上万的自行车，你也不可能在遥远的地方找到它。立即尝试：这种效果是绝对真实的。你的大脑已经学会完全抽象它的视觉输入，将其转化为高级视觉概念，同时完全滤除不相关的视觉细节，使得记住我们身边的事物实际上看起来非常困难。

可视化convnet过滤器

检查由convnets学习的过滤器的另一个容易的事情是显示每个过滤器应该响应的视觉模式。这可以通过输入空间中的渐变上升来完成：将梯度下降应用于convnet的输入图像的值，以便从空白输入图像开始最大化特定滤镜的响应。得到的输入图像将是所选滤波器最大响应的图像。
该过程很简单：我们将建立一个损失函数，使给定卷积层中给定滤波器的值最大化，然后我们将使用随机梯度下降来调整输入图像的值，以最大化该激活值。例如，在ImageNet上预先训练的VGG16网络层“block3_conv1”中激活过滤器0的过程如下：

from keras.applications import VGG16
from keras import backend as K

model = VGG16(weights='imagenet',
              include_top=False)

layer_name = 'block3_conv1'
filter_index = 0

layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])

为了实现梯度下降，我们需要相对于模型输入的这种损失的梯度。为此，我们将使用与Keras后端模块一起打包的渐变函数：

# The call to `gradients` returns a list of tensors (of size 1 in this case)
# hence we only keep the first element -- which is a tensor.
grads = K.gradients(loss, model.input)[0]

梯度下降过程平稳使用的一个非显而易见的技巧是通过将梯度张量除以其L2范数（张量中的平均值的平方的平方根）来归一化梯度张量。这确保了对输入图像进行更新的大小始终在相同的范围内。

# We add 1e-5 before dividing so as to avoid accidentally dividing by 0.
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

现在我们需要一种方法来计算输入图像中损耗张量和梯度张量的值。我们可以定义Keras后端函数来执行此操作：iterate是一个函数，它采用Numpy张量（作为大小为1的张量列表）并返回两个Numpy张量的列表：损失值和梯度值。

iterate = K.function([model.input], [loss, grads])

# Let's test it:
import numpy as np
loss_value, grads_value = iterate([np.zeros((1, 150, 150, 3))])

在这一点上，我们可以定义一个Python循环来做随机梯度下降：

# We start from a gray image with some noise
input_img_data = np.random.random((1, 150, 150, 3)) * 20 + 128.

# Run gradient ascent for 40 steps
step = 1.  # this is the magnitude of each gradient update
for i in range(40):
    # Compute the loss value and gradient value
    loss_value, grads_value = iterate([input_img_data])
    # Here we adjust the input image in the direction that maximizes the loss
    input_img_data += grads_value * step

得到的图像张量将是形状的浮点张量（1,150,150,3），其值在[0,255]内可能不是整数。因此，我们需要对这张张进行后期处理，将其变成可显示的图像。我们用以下直接的实用功能来做到这一点：

def deprocess_image(x):
    # normalize tensor: center on 0., ensure std is 0.1
    x -= x.mean()
    x /= (x.std() + 1e-5)
    x *= 0.1

    # clip to [0, 1]
    x += 0.5
    x = np.clip(x, 0, 1)

    # convert to RGB array
    x *= 255
    x = np.clip(x, 0, 255).astype('uint8')
    return x

现在我们已经完成了所有工作，让我们把它们放在一个Python函数中，该函数将输入图层名称和过滤器索引作为输入，并返回一个有效的图像张量，表示最大化激活指定过滤器的模式：

def generate_pattern(layer_name, filter_index, size=150):
    # Build a loss function that maximizes the activation
    # of the nth filter of the layer considered.
    layer_output = model.get_layer(layer_name).output
    loss = K.mean(layer_output[:, :, :, filter_index])

    # Compute the gradient of the input picture wrt this loss
    grads = K.gradients(loss, model.input)[0]

    # Normalization trick: we normalize the gradient
    grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

    # This function returns the loss and grads given the input picture
    iterate = K.function([model.input], [loss, grads])
    
    # We start from a gray image with some noise
    input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.

    # Run gradient ascent for 40 steps
    step = 1.
    for i in range(40):
        loss_value, grads_value = iterate([input_img_data])
        input_img_data += grads_value * step
        
    img = input_img_data[0]
    return deprocess_image(img)

让我们试试这个：

plt.imshow(generate_pattern('block3_conv1', 0))
plt.show()

看来层block3_conv1中的过滤器0对波尔卡点图案有响应。

现在有趣的部分：我们可以开始可视化每个图层中的每个过滤器。为了简单起见，我们只会看每层中的前64个滤波器，并且只会查看每个卷积块的第一层（块1_conv1，块2_conv1，块3_conv1，块4_conv1，块5_conv1）。我们会将输出安排在64x64滤镜模式的8x8网格上，每个滤镜模式之间会有一些黑色边距。

for layer_name in ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1']:
    size = 64
    margin = 5

    # This a empty (black) image where we will store our results.
    results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3))

    for i in range(8):  # iterate over the rows of our results grid
        for j in range(8):  # iterate over the columns of our results grid
            # Generate the pattern for filter `i + (j * 8)` in `layer_name`
            filter_img = generate_pattern(layer_name, i + (j * 8), size=size)

            # Put the result in the square `(i, j)` of the results grid
            horizontal_start = i * size + i * margin
            horizontal_end = horizontal_start + size
            vertical_start = j * size + j * margin
            vertical_end = vertical_start + size
            results[horizontal_start: horizontal_end, vertical_start: vertical_end, :] = filter_img

    # Display the results grid
    plt.figure(figsize=(20, 20))
    plt.imshow(results)
    plt.show()

    这些过滤器可视化告诉了我们很多关于如何使用convnet图层来查看世界的信息：convnet中的每个图层都会简单地学习一组过滤器，以便可以将它们的输入表示为过滤器的组合。这与傅立叶变换如何将信号分解到一组余弦函数上类似。随着我们在模型中进一步提高，这些闭环滤波器组中的滤波器变得越来越复杂和精致：
        来自模型第一层（block1_conv1）的滤镜对一些简单的方向边缘和颜色（在某些情况下为彩色边缘）进行编码。
        来自block2_conv1的过滤器对由边和颜色组合而成的简单纹理进行编码。

较高层中的过滤器开始类似于自然图像中的纹理：羽毛，眼睛，叶子等。

可视化类激活的热图

    我们将介绍另一种可视化技术，该技术对于理解给定图像的哪些部分导致其最终分类决定的蛛丝马迹是有用的。这有助于“调试”一个定位点的决策过程，特别是在出现分类错误的情况下。它还允许您在图像中查找特定对象。
    这种一般类别的技术被称为“类激活图”（CAM）可视化，并且包括在输入图像上产生“类激活”的热图。 “类激活”热图是与特定输出类相关联的二维网格，针对任何输入图像中的每个位置计算，指示每个位置相对于所考虑的类的重要程度。例如，如果将图像输入到我们的“猫与狗”之一的小圆点中，则类激活图可视化允许我们为类“猫”生成热图，指示图像的不同部分如何像猫一样，同样地为类“狗”，指示如何像狗像不同部分的形象。
    我们将使用的具体实现是Grad-CAM中描述的那个：你为什么这么说？深度网络通过基于渐变的本地化的可视化解释。它非常简单：它包含在给定输入图像的情况下获取卷积图层的输出特征映射，并且通过类别相对于通道的梯度来权衡该特征映射中的每个通道。直观地说，理解这种技巧的一种方法是，我们通过“每个通道对于课程的重要程度”来加权一个“输入图像激活不同通道的强度如何”的空间图，从而产生“多强烈的空间图”输入图像激活类“。

我们将再次使用预先训练的VGG16网络演示此技术：

from keras.applications.vgg16 import VGG16

K.clear_session()

# Note that we are including the densely-connected classifier on top;
# all previous times, we were discarding it.
model = VGG16(weights='imagenet')

让我们将这个图像转换为VGG16模型可以读取的内容：模型在224x244大小的图像上进行训练，根据封装在效用函数keras.applications.vgg16.preprocess_input中的一些规则进行预处理。因此，我们需要加载图像，将其大小调整为224x224，将其转换为Numpy float32张量，并应用这些预处理规则。

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np

# The local path to our target image
img_path = '/Users/fchollet/Downloads/creative_commons_elephant.jpg'

# `img` is a PIL image of size 224x224
img = image.load_img(img_path, target_size=(224, 224))

# `x` is a float32 Numpy array of shape (224, 224, 3)
x = image.img_to_array(img)

# We add a dimension to transform our array into a "batch"
# of size (1, 224, 224, 3)
x = np.expand_dims(x, axis=0)

# Finally we preprocess the batch
# (this does channel-wise color normalization)
x = preprocess_input(x)

preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=3)[0])

    此图片预测的前三类是：
        非洲大象（92.5％的概率）
        Tusker（有7％的概率）
        印度大象（有0.4％的概率）

因此，我们的网络已经认识到我们的形象包含未定数量的非洲大象。预测向量中被最大程度激活的条目是对应于“非洲象”类的条目，在索引386处：

为了可视化我们图像的哪些部分是最像“非洲象”的那样，让我们设置Grad-CAM过程：

# This is the "african elephant" entry in the prediction vector
african_elephant_output = model.output[:, 386]

# The is the output feature map of the `block5_conv3` layer,
# the last convolutional layer in VGG16
last_conv_layer = model.get_layer('block5_conv3')

# This is the gradient of the "african elephant" class with regard to
# the output feature map of `block5_conv3`
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]

# This is a vector of shape (512,), where each entry
# is the mean intensity of the gradient over a specific feature map channel
pooled_grads = K.mean(grads, axis=(0, 1, 2))

# This function allows us to access the values of the quantities we just defined:
# `pooled_grads` and the output feature map of `block5_conv3`,
# given a sample image
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])

# These are the values of these two quantities, as Numpy arrays,
# given our sample image of two elephants
pooled_grads_value, conv_layer_output_value = iterate([x])

# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the elephant class
for i in range(512):
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]

# The channel-wise mean of the resulting feature map
# is our heatmap of class activation
heatmap = np.mean(conv_layer_output_value, axis=-1)

为了可视化目的，我们还将0到1之间的热图标准化：

heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)
plt.show()

最后，我们将使用OpenCV生成一张图像，该图像将原始图像与我们刚获得的热图叠加在一起：

import cv2

# We use cv2 to load the original image
img = cv2.imread(img_path)

# We resize the heatmap to have the same size as the original image
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))

# We convert the heatmap to RGB
heatmap = np.uint8(255 * heatmap)

# We apply the heatmap to the original image
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)

# 0.4 here is a heatmap intensity factor
superimposed_img = heatmap * 0.4 + img

# Save the image to disk
cv2.imwrite('/Users/fchollet/Downloads/elephant_cam.jpg', superimposed_img)

Keras学习教程九

Visualizing what convnets learn

可视化中间层

可视化convnet过滤器

可视化类激活的热图

猜你喜欢