孪生神经网络(Siamese Network)对Fashion-MNIST数据的分类识别

项目介绍

这个项目是我在昆士兰科技大学学习AI课程时的一次作业,由我和潘永瑞共同完成。

数据介绍

本项目使用的数据集是keras.datasets.fashion_mnist.load_data中的内置数据集,称为Fashion-MNIST。数据集包含用类标记的图像,这些类包括[“上衣”、“裤装”、“套头衫”、“外套”、“凉鞋”、“踝靴”、“连衣裙”、“运动鞋”、“包”、“衬衫”](["top", "trouser", "pullover", "coat", "sandal", "ankle boot", "dress", "sneaker", "bag", "shirt"])。图片都是28x28大小的灰度图像。

孪生神经网络结构

一个孪生神经网络由两个相同的子网络组成,两个子网络共享相同的权重,他们连接着一个距离计算层。下图显示了结构。

运行环境

这个项目使用python 3.7版本。keras是这个项目的核心部分。它涉及到网络和分类器的构建。NumPy和Matplotlib.pyplot也被引用,它们分别用于数据集切片和图形绘制。细节如下:

 1 import random
 2 import tensorflow as tf  
 3 from tensorflow import keras  
 4 from keras.layers import Input, Flatten, Dense, Dropout, Lambda, MaxPooling2D  
 5 from keras.models import Model  
 6 from keras.optimizers import RMSprop  
 7 from keras import backend as K  
 8 from keras.layers.convolutional import Conv2D  
 9 from keras.layers import LeakyReLU  
10 from keras.regularizers import l2  
11 from keras.models import Model, Sequential  
12 from tensorflow.keras import regularizers  
13 import numpy as np  
14 import matplotlib.pyplot as plt  

项目目标

使用类别为 ["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] 的图片训练神经网络

通过以下方法评估网络的泛化能力:

1.用参与训练的类别["top", "trouser", "pullover", "coat", "sandal", "ankle boot"]的测试集来评估网络

2.用参与训练的类别["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] 以及未参与训练的类别["dress", "sneaker", "bag", "shirt"]组成的测试集来评估网络

3.用未参与训练的类别["dress", "sneaker", "bag", "shirt"]的测试集来评估网络

代码讲解

数据的载入和观察

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()  

print('train_images : ', x_train.shape, x_train.dtype)  
print('train_labels : ', y_train.shape, y_train.dtype)  
print('test_images : ', x_test.shape, x_test.dtype)  
print('test_labels : ', y_test.shape, y_test.dtype)  

输出如下

train_images :  (60000, 28, 28) uint8

train_labels :  (60000,) uint8

test_images :  (10000, 28, 28) uint8

test_labels :  (10000,) uint8

可以看到数据为我们准备好了60000张图片组成的训练集以及10000张图片的测试集

来看看训练集的第一张图片长什么样子

plt.figure()  
plt.imshow(x_train[0, :])  
plt.colorbar()  
plt.grid(False)  
plt.show()  

 可以看出这是一张踝靴的图片

数据归一化

我们对数据进行归一化处理,把训练集和测试集除以255,把值的范围缩小到0-1。然后我们可以取值为“1”的像素作为黑色,值为“0”的像素作为白色。

x_train = x_train.astype('float32')  
x_test = x_test.astype('float32')  
x_train = x_train / 255.0  
x_test = x_test / 255.0  

数据切片

我们把数据根据类别切片,为之后做准备。

# split labels ["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] to train set 
digit_indices = [np.where(y_train == i)[0] for i in {0,1,2,4,5,9}] digit_indices = np.array(digit_indices) # length of each column n = min([len(digit_indices[d]) for d in range(6)]) # Keep 80% of the images with labels ["top", "trouser", "pullover", "coat", "sandal", "ankleboot"] for training (and 20% for testing) train_set_shape = n * 0.8 test_set_shape = n * 0.2 y_train_new = digit_indices[:, :int(train_set_shape)] y_test_new = digit_indices[:, int(train_set_shape):] # Keep 100% of the images with labels in ["dress", "sneaker", "bag", "shirt"] for testing digit_indices_t = [np.where(y_train == i)[0] for i in {3,6,7,8}] y_test_new_2 = np.array(digit_indices_t) print(y_train_new.shape) print(y_test_new.shape) print(y_test_new_2.shape)

(6, 4800)

(6, 1200)

(4, 6000)

创造图片对

为了建立一个能够识别两个图像是否属于同一类的分类器,我们需要在整个数据集中创建一对又一对的图像。

我们所做的方法是:1)对于属于每个给定类的每个图像,我们选择它旁边的图像并形成一对。例如,在“top”类中,第一图像和第二图像将形成一对,第二图像将与第三图像形成一对……这些对将是正对(positive pairs)。2) 同时,我们选择一个属于另一个类的图像并形成一对。例如,“top”类中的第一个图像将与“pullover”类中的第一个图像形成一对。这些对将是负对(negative pairs)。3) 我们将正负对的每个组合的标签指定为[1,0]。

def create_pairs(self, x, digit_indices):
        '''
        Positive and negative pair creation.
        Alternates between positive and negative pairs.
        '''
        pairs = []
        # labels are 1 or 0 identify whether the pair is positive or negative
        labels = []

        class_num = digit_indices.shape[0]
        for d in range(class_num):
            for i in range(int(digit_indices.shape[1]) - 1):
                # use images from the same class to create positive pairs
                z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
                pairs += [[x[z1], x[z2]]]
                # use random number to find images from another class to create negative pairs
                inc = random.randrange(1, class_num)
                dn = (d + inc) % class_num
                z1, z2 = digit_indices[d][i], digit_indices[dn][i]
                pairs += [[x[z1], x[z2]]]
                # add two labels which the first one is positive class and the second is negative.
                labels += [1, 0]
        return np.array(pairs), np.array(labels)

这里用一个输入输出的形式来帮助理解:

输入: [image1, image2, image3...] [label1, label2, label3...]
输出: [[[image1, image2], [image1, image102]], [[image2, image3], [image2, image302]]...]   [[0, 1], [0, 1]...]

假设image 1-100 的类别为'猫', image 101-200 的类别为'狗' [[猫,猫], [猫,狗]] 对应一个 [[0, 1]]

# two image  
tr_pairs, tr_y = create_pairs(x_train, y_train_new)  
tr_pairs = tr_pairs.reshape(tr_pairs.shape[0], 2, 28, 28, 1)  
print(tr_pairs.shape)  
 
te_pairs_1, te_y_1 = create_pairs(x_train, y_test_new)  
te_pairs_1 = te_pairs_1.reshape(te_pairs_1.shape[0], 2, 28, 28, 1)  
print(te_pairs_1.shape)  

te_pairs_2, te_y_2 = create_pairs(x_train, y_test_new_2)  
te_pairs_2 = te_pairs_2.reshape(te_pairs_2.shape[0], 2, 28, 28, 1)  
print(te_pairs_2.shape)  

(57588, 2, 28, 28, 1)

(14388, 2, 28, 28, 1)

(47992, 2, 28, 28, 1)

基本网络构成

基本网络是一个CNN网络

首先,我们有一个卷积+relu层和一个更大尺寸的7*7滤波器,然后是一个maxpooling层,它减少了参数以减少计算和过度拟合。然后,还有另一个卷积+relu层,其具有较小尺寸的滤波器3×3。然后,展平层将多维展平到一维,用于随后的完全连接层。此外,在若干层中,正则化器用于减少过度拟合。

def create_base_network(input_shape):
        '''
        Base network to be shared.
        '''
        input = Input(shape=input_shape)
        x = Conv2D(32, (7, 7), activation='relu', input_shape=input_shape, kernel_regularizer=regularizers.l2(0.01),
                   bias_regularizer=regularizers.l1(0.01))(input)
        x = MaxPooling2D()(x)
        x = Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.01),
                   bias_regularizer=regularizers.l1(0.01))(x)
        x = Flatten()(x)
        x = Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.01),
                  bias_regularizer=regularizers.l1(0.01))(x)

        return Model(input, x)  

input_shape = (28,28,1)  

base_network = create_base_network(input_shape)  

input_a = Input(shape=input_shape)  
input_b = Input(shape=input_shape)  

# because we re-use the same instance `base_network`,  
# the weights of the network  
# will be shared across the two branches  
processed_a = base_network(input_a)  
processed_b = base_network(input_b)  
print(base_network.summary())  

损失函数

# add a lambda layer  
distance = Lambda(euclidean_distance,  
                  output_shape=eucl_dist_output_shape)([processed_a, processed_b])  

model = Model([input_a, input_b], distance)

模型训练

现在我们已经完成孪生神经网络结构,可以开始使用训练数据集对模型进行训练。

# train  
rms = RMSprop()  
model.compile(loss=contrastive_loss, optimizer=rms, metrics=[accuracy])  
history = model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,  
           batch_size=128,  
           epochs=epochs,  
           validation_data=([te_pairs_1[:, 0], te_pairs_1[:, 1]], te_y_1))  

模型预测

 从现在起,这个模型就能够做出预测。它以测试数据作为输入。根据预测结果可以评价其准确性。

y_pred = model.predict([tr_pairs[:, 0], tr_pairs[:, 1]])  
tr_acc = compute_accuracy(tr_y, y_pred)  
y_pred = model.predict([te_pairs_1[:, 0], te_pairs_1[:, 1]])  
te_acc = compute_accuracy(te_y_1, y_pred)  

模型评估

1.用参与训练的类别["top", "trouser", "pullover", "coat", "sandal", "ankle boot"]的测试集来评估网络

* Accuracy on training set: 93.66%

* Accuracy on test set: 93.38%

2.用参与训练的类别["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] 以及未参与训练的类别["dress", "sneaker", "bag", "shirt"]组成的测试集来评估网络

* Accuracy on test set: 83.94%

3.用未参与训练的类别["dress", "sneaker", "bag", "shirt"]的测试集来评估网络

* Accuracy on test set: 74.85%

结论

我们可以看出孪生神经网络对于没有参与训练的类别,也有不错的辨别能力(74.85%),这个泛化能力对于复杂的现实生活,有着广大的应用前景。

猜你喜欢

转载自www.cnblogs.com/tangjianwei/p/12631977.html