迁移学习与计算机视觉实践

前情提要

前五期我们提到了

迁移学习与计算机视觉

迁移学习与图像分类

迁移学习与目标检测（faster RCNN）

迁移学习与语义分割（Seg Net）

迁移学习与实例分割（ResNeXt）

接下来为大家讲解

迁移学习与计算机视觉实践

简约线条树木小图标分割线组合

本节开始，我们将使用VGG16模型做一些小实验，进而巩固我们对迁移学习与计算机视觉理论知识的理解。与第四章一样，我们同样是基于fashion MNIST数据的图像分类去做实验。

在2017年8月份，德国研究机构Zalando Research在GitHub上推出了一个全新的数据集，其中训练集包含60000个样例，测试集包含10000个样例，分为10类，每一类的样本训练样本数量和测试样本数量相同。样本都来自日常穿着的衣裤鞋包，每个都是28×28的灰度图像，其中总共有10类标签，每张图像都有各自的标签。

值得注意的是，因为VGG16只能识别尺寸大于48×48的彩色图片，而我们的数据集是28×28的灰色图片，因此我们在将数据集灌入迁移学习模型前，要对图片数据集进行适当地转换，也就是比第四章中传统CNN神经网络的图像预处理多了一步：将图片转换成48×48大小的彩色图片。

7.3.1 实验环境

(1) Anaconda Python 3.7与Jupyter Notebook

(2) Keras

(3) fashion MNIST数据数据集

7.3.2 实验流程

(1) 加载图像数据

(2) 图像数据预处理

(3) 训练模型

(4) 保存模型与模型可视化

(5) 训练过程可视化

7.3.3 代码

1. # chapter7/7_3_Transfer_learning_cnn_image.ipynb

2. from tensorflow.python.keras.utils import get_file

3. import gzip

4. import numpy as np

5. import keras

6. from keras.datasets import cifar10

7. from keras.preprocessing.image import ImageDataGenerator

8. from keras.models import Sequential, Model

9. from keras.layers import Dense, Dropout, Activation, Flatten

10.from keras.layers import Conv2D, MaxPooling2D

11.import os

12.from keras import applications

13.import cv2

14.import functools

15.from keras.models import load_model

16.# os.environ["CUDA_VISIBLE_DEVICES"] = "1" # 使用第2个GPU

1. 读取数据与数据预处理

1. # 数据集与代码放在同一个文件夹即可

2. def load_data():

3. paths = [

4. 'train-labels-idx1-ubyte.gz', 'train-images-idx3-ubyte.gz',

5. 't10k-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz'

6. ]

8. with gzip.open(paths[0], 'rb') as lbpath:

9. y_train = np.frombuffer(lbpath.read(), np.uint8, offset=8)

10.

11. with gzip.open(paths[1], 'rb') as imgpath:

12. x_train = np.frombuffer(

13. imgpath.read(), np.uint8, offset=16).reshape(len(y_train), 28, 28, 1)

14.

15. with gzip.open(paths[2], 'rb') as lbpath:

16. y_test = np.frombuffer(lbpath.read(), np.uint8, offset=8)

17.

18. with gzip.open(paths[3], 'rb') as imgpath:

19. x_test = np.frombuffer(

20. imgpath.read(), np.uint8, offset=16).reshape(len(y_test), 28, 28, 1)

21.

22. return (x_train, y_train), (x_test, y_test)

23.

24. # 读取数据集

25.(x_train, y_train), (x_test, y_test) = load_data()

26.batch_size = 32

27.num_classes = 10

28.epochs = 5

29.data_augmentation = True # 图像增强

30.num_predictions = 20

31.save_dir = os.path.join(os.getcwd(), 'saved_models_transfer_learning')

32.model_name = 'keras_fashion_transfer_learning_trained_model.h5'

33.

34.

35.# 将类别弄成独热编码

36.y_train = keras.utils.to_categorical(y_train, num_classes)

37.y_test = keras.utils.to_categorical(y_test, num_classes)

38.

39.

40.# 由于mist的输入数据维度是(num, 28, 28)，因为vgg16 需要三维图像,所以扩充mnist的最后一维

41.X_train = [cv2.cvtColor(cv2.resize(i, (48, 48)), cv2.COLOR_GRAY2RGB) for i in x_train]

42.X_test = [cv2.cvtColor(cv2.resize(i, (48, 48)), cv2.COLOR_GRAY2RGB) for i in x_test]

43.

44.x_train = np.asarray(X_train)

45.x_test = np.asarray(X_test)

46.

47.x_train = x_train.astype('float32')

48.x_test = x_test.astype('float32')

49.

50.x_train /= 255 # 归一化

51.x_test /= 255 # 归一化

2. 迁移学习建模

1. # 使用VGG16模型

2. # 将VGG16的卷积层作为基底网络

3. base_model = applications.VGG16(include_top=False, weights='imagenet', input_shape=x_train.shape[1:]) # 第一层需要指出图像的大小

4. print(x_train.shape[1:])

6. model = Sequential() # 自定义网络

7. print(base_model.output)

8. model.add(Flatten(input_shape=base_model.output_shape[1:]))

9. model.add(Dense(256, activation='relu'))

10.model.add(Dropout(0.5))

11.model.add(Dense(num_classes))

12.model.add(Activation('softmax'))

13.

14.# VGG16模型与自己构建的模型合并

15.model = Model(inputs=base_model.input, outputs=model(base_model.output))

16.

17.# 保持VGG16的前15层权值不变，即在训练过程中不训练

18.for layer in model.layers[:15]:

19. layer.trainable = False

20.

21.# 初始化优化器

22.opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)

23.

24.# Let's train the model using RMSprop

25.model.compile(loss='categorical_crossentropy',

26. optimizer=opt,

27. metrics=['accuracy'])

3. 训练

1. if not data_augmentation: # 是否选择数据增强

2. print('Not using data augmentation.')

3. history = model.fit(x_train, y_train,

4. batch_size=batch_size,

5. epochs=epochs,

6. validation_data=(x_test, y_test),

7. shuffle=True)

8. else:

9. print('Using real-time data augmentation.')

10. datagen = ImageDataGenerator(

11. featurewise_center=False,

12. samplewise_center=False,

13. featurewise_std_normalization=False,

14. samplewise_std_normalization=False,

15. zca_whitening=False,

16. zca_epsilon=1e-06,

17. rotation_range=0,

18. width_shift_range=0.1,

19. height_shift_range=0.1,

20. shear_range=0.,

21. zoom_range=0.,

22. channel_shift_range=0.,

23. fill_mode='nearest',

24. cval=0.,

25. horizontal_flip=True,

26. vertical_flip=False,

27. rescale=None,

28. preprocessing_function=None,

29. data_format=None,

30. validation_split=0.0)

31.

32. datagen.fit(x_train)

33. print(x_train.shape[0]//batch_size) # 取整

34. print(x_train.shape[0]/batch_size) # 保留小数

35. # 按batch_size大小从x,y生成增强数据

36. history = model.fit_generator(datagen.flow(x_train, y_train,

37. batch_size=batch_size),

38. epochs=epochs,

39. steps_per_epoch=x_train.shape[0]//batch_size,

40. validation_data=(x_test, y_test),

41. # 在使用基于进程的线程时，最多需要启动的进程数量。

42. workers=10

43. )

4. 保存模型与模型可视化

1. model.summary() # 模型可视化

2. # 保存模型

3. if not os.path.isdir(save_dir):

4. os.makedirs(save_dir)

5. model_path = os.path.join(save_dir, model_name)

6. model.save(model_path)

print('Saved trained model at %s ' % model_path)

5. 训练过程可视化

1. import matplotlib.pyplot as plt

2. # 绘制训练 & 验证的准确率值

3. plt.plot(history.history['acc'])

4. plt.plot(history.history['val_acc'])

5. plt.title('Model accuracy')

6. plt.ylabel('Accuracy')

7. plt.xlabel('Epoch')

8. plt.legend(['Train', 'Valid'], loc='upper left')

9. plt.savefig('tradition_cnn_valid_acc.png')

10.plt.show()

11.

12.# 绘制训练 & 验证的损失值

13.plt.plot(history.history['loss'])

14.plt.plot(history.history['val_loss'])

15.plt.title('Model loss')

16.plt.ylabel('Loss')

17.plt.xlabel('Epoch')

18.plt.legend(['Train', 'Valid'], loc='upper left')

19.plt.savefig('tradition_cnn_valid_loss.png')

20.plt.show()

7.3.4 结果分析

正如本章介绍的迁移学习原理所述，上述代码就是拆掉了输出层并冻结了VGG16模型前15层的权值，然后VGG16之后下接我们想要的输出层，进而就能得到我们想要的训练模型了。其余的操作和第四章传统的CNN卷积神经网络模型并没有太大区别。不过，我们使用迁移学习只跑了5个epoch，准确率就已经可以到0.90了。对比第四章的实验，我们同样跑了5个epoch，准确率却只在0.81左右。因此，借用迁移学习的力量，我们能更出色地完成了图像分类任务。

当然了，大家也可以选择不冻结VGG16的部分或者全部卷积层次，让非冻结层与定制的网络一起训练。

总结

本章介绍了计算机视觉的四大基础任务和迁移学习的基本原理与用途。在大多数实际应用中，我们通常将迁移学习与计算机视觉结合使用以获得更好的效果。当前大多数研究集中在有监督的学习上，如何通过深度神经网络在无监督或半监督学习中转移知识在未来也必将会引起越来越多的关注。此外，深度神经网络中的转移知识需要有更强大的物理支持，这需要物理学家、神经科学家和计算机科学家的合作。我们可以预见，随着深度神经网络的发展，深度迁移学习将被广泛应用于解决许多具有挑战性的问题。