Pyhton_lab-HW2

数据集data介绍

MNIST数据集(Mixed National Institute of Standards and Technology database)是美国国家标准与技术研究院收集整理的大型手写数字数据库,包含60,000个示例的训练集以及10,000个示例的测试集。每张图片的长度和宽度都是28. y值是0-9，代表10个数字。

Load data by using TF

(trainX,trainY), (testX,testY) = tf.keras.datasets.mnist.load_data()

已知y的取值是0-9，那么x则是图像显示0-9. 把前几张图片显示出来。

# 把X的部分图片表现出来 
for i in range(9):
	# define subplot
	plt.subplot(331+i)
	# plot raw pixel data
	plt.imshow(trainX[i], cmap=plt.get_cmap('gray'))
# show the figure
plt.show()

在这里插入图片描述

（1）Data Preprocessing

首先，我们需要从train data 取20%的作为valid data。这里可以使用train_test_split()函数来分离出20%data 作为valid data

from sklearn.model_selection import train_test_split
X_train, X_valid, Y_train, Y_valid = train_test_split(trainX,trainY,test_size=0.2)

然后，需要把图片的28*28data，进行扁平化。变成784。然后再进行normalized化，除最大值255，使得数值映射到0-1内。

对feature set X进行处理（扁平化+normalized）

#对于X_train,X_valid,X_test扁平化再normalize
X_train_reshape = tf.cast(X_train.reshape((X_train.shape[0],-1))/255.0,dtype='float32')
X_valid_reshape = tf.cast(X_valid.reshape((X_valid.shape[0],-1))/255.0,dtype='float32')
X_test_reshape = tf.cast(X_test.reshape((X_test.shape[0],-1))/255.0,dtype='float32')

对y值进行one hot 处理

one hot处理，比如y=1，则转换为y=[0,1,0,0,0,0,0,0,0,0]，即array位置对应着相应的数字。

# one hot encode target values
y_train_encode = tf.one_hot(y_train,depth = 10)
y_valid_encode = tf.one_hot(y_valid,depth = 10)
y_test_encode = tf.one_hot(y_test,depth = 10)

（2）模型的设置

设置两层的layers里的networks。详细的介绍：
在这次神经网络中，input的神经元为784个，hidden 的神经元为128个，output的神经元为10个。用来分类。公式为
$h_1 = ReLU(XW_1 + b_1)$ $Y = h_1W_2 + b_2$

第一层是，input是X_train_reshape的shape为 (48000,784)。W1的shape为(784,128)，b1的shape为(1,128). 经过W1和b1的转换，变成128个神经元，再经过激活函数relu来输出为h。这是第一层。

下一层的参数W2的shape为(128,10)，b2的shape为(1,10)，经过W2，和b2的转换，变成10个神经元，再经过sigmoid激活函数输出每个类的概率值。取其中最大的概率的那个类作为pred值。

参数的初始值设置

理解模型所需的参数之后，开始定义初始值的参数 W1, W2, b1, b2。这里我们定义W1和W2的初始值为0.01，b1,b2初始值定义为0

#W_1的值全部为 0.01 , size 为784*128
W_1 = tf.Variable(tf.ones(shape=[784,128])/100)
#b_1的值全部为0，size为1*128
b1 = tf.Variable(tf.zeros(shape=[128]))

#W_2的值全部为 0.01 , size 为128*10
W_2 = tf.Variable(tf.ones(shape=[128,10])/100)
#b_2的值全部为0，size为1*10
b2 = tf.Variable(tf.zeros(shape=[10]))

定义模型的函数

def feedforward(X):
  '''
  Implement the feed forward step for given X
  hidden_out = ReLU(X*W1 + b1)
  output = hidden_output * W2 + b2
  '''
  #matmul就是普通的矩阵相乘 需要里面的维度相同，比如 (3,2)*(2,3)=(3,3)
    hidden_in = tf.add(tf.matmul(X,W1), b1)      #公式X*W1 + b1
    hidden_out = tf.nn.relu(hidden_in)           #激活函数出来
    output = tf.add(tf.matmul(hidden_out,W2),b2) #hidden_out * W2 + b2
    return output

开始训练模型Using TF

#定义优化器
optimizer = tf.keras.optimizers.SGD(learning_rate=0.5)

设置

def one_step_train(x,y,optim, W_1 = W_1, b1 = b1, W_2 = W_2, b2 = b2):
  with tf.GradientTape() as tape:
    #求的 y_pred = logit--还没使用softmax函数计算概率，下一步会用tf里面内置函数计算
    train_logit = feedforward (x)
    #计算loss function --cross entropy【y是one hot后的实际值】
    train_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y,
                                                             logits = train_logit)
    train_loss = tf.reduce_mean(train_cross_en)  #对loss取平均值

  #对W1,W2,b1,b2 计算梯度 
  grads = tape.gradient(train_loss,[W_1,b1,W_2,b2])
  optim.apply_gradients(zip(grads,[W_1,b1,W_2,b2]))  #更新所有的参数W1,W2,b1,b2
  return train_loss

解释tf内置函数

tf.nn.softmax() 函数

#随机设置两个samples 值对应的y labels和x输出的logits值
labels = [[0,0,1,0,0,0,0,0,0,0], [0,0,0,0,1,0,0,0,0,0]]
logits = [[0.1,0.2,0.3,0.5,0.2,0.3,0.4,0.5,0.2,0.1], [0.1,0.2,0.3,0.5,0.2,0.3,0.4,0.5,0.2,0.1]]

计算这两个samples的里面每个的softmax值,shape为(2,10)

logit_scaled = tf.nn.softmax(logits)
print(logit_scaled)

在这里插入图片描述

计算loss值

result = tf.nn.softmax_cross_entropy_with_logits(labels = labels, logits = logits)
print(result)

在这里插入图片描述
这里这两个值代表这两个sample的loss值，后面会算一个mean值作为总loss值。使用tf.reduce_mean()函数

train_loss = tf.reduce_mean(result)

所以把这些总结在一起，便是one step train 函数

def one_step_train(x,y,optim, W_1 = W1, b_1 = b1, W_2 = W2, b_2 = b2):
  with tf.GradientTape() as tape:
    #求的 y_pred = logit--还没使用softmax函数计算概率，下一步会用tf里面内置函数计算
    train_logit = feedforward (x)
    #计算loss function --cross entropy【y是one hot后的实际值】
    train_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y,
                                                             logits = train_logit)
    train_loss = tf.reduce_mean(train_cross_en)  #对loss取平均值最为最后的train_loss值

  #对W1,W2,b1,b2 计算梯度 
  grads = tape.gradient(train_loss,[W_1,b_1,W_2,b_2])
  optim.apply_gradients(zip(grads,[W_1,b_1,W_2,b_2]))
  return train_loss

（3）模型的迭代

#迭代的次数 1000次
iterations = 1000
#store the training losses
train_loss_ls = []
#store the validation losses 
valid_loss_ls = []
for i in range(iterations):
  train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optimizer)
  valid_logits = feedforward(X_valid_reshape)
  valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                           logits = valid_logits)
  
  valid_loss = tf.reduce_mean(valid_cross_en)
  if i % 100 == 0:
    print('step {0:3d}: training loss {1:.4f} \n\t  validation loss {2:.4f}'.format(i, train_loss,valid_loss))
  train_loss_ls.append(train_loss)
  valid_loss_ls.append(valid_loss)

在这里插入图片描述

（4）把train loss和valid loss 可视化

fig, ax = plt.subplots(figsize=(8,4))
ax.plot(train_loss_ls, label='Train Loss')
ax.plot(valid_loss_ls, label='Valid Loss')
ax.set_xlabel("Epoch #")
ax.set_ylabel("Loss")
ax.legend();

在这里插入图片描述

（5）模型的比较

改一下模型的optimizer优化器和参数，看看哪个模型的效果更好

Option A: tf.keras.optimizers.SGD with learning rate = 0.5
Option B: tf.keras.optimizers.SGD with learning rate = 0.001
Option C: tf.keras.optimizers.Adam with learning rate = 0.001

Option A就是上面我们已经实践过的模型。接下来是实践模型B和模型C

实践模型B

tf.keras.optimizers.SGD with learning rate = 0.001

初始化参数

W_1 = tf.Variable(tf.ones(shape=[784,128])/100)
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.ones(shape= [128,10])/100)
b_2 = tf.Variable(tf.zeros(shape= [10]))

设置优化器的参数

optim_B = tf.keras.optimizers.SGD(learning_rate = 0.001)

开始训练和迭代模型

#迭代数为2000
iterations = 2001
#store the training losses
train_loss_ls_B = []
#store the validation losses 
valid_loss_ls_B = []
#书写训练模型的迭代
for i in range(iterations):
	#利用上次的one_step_train来计算train的 loss
	train_loss = one_step_train(X_train_reshape,y_train_encode, optim = optim_B)
	#开始一步一步写valid的loss,因为需要把上一步更新过的参数来计算valid loss
	valid_logits = feedforward(X_valid_reshape)
	#计算所有samlpes的loss值
	valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,logits = valid_logits)
	#计算平均loss值作为valid_loss
	valid_loss = tf.reduce_mean(valid_cross_en)
	if ii % 100 == 0:
		print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(i, train_loss,valid_loss))
	train_loss_ls_B.append(train_loss)
	valid_loss_ls_B.append(valid_loss)

在这里插入图片描述

实践模型C

tf.keras.optimizers.Adam with learning rate = 0.001

初始化参数

W_1 = tf.Variable(tf.ones(shape=[784,128])/100)
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.ones(shape= [128,10])/100)
b_2 = tf.Variable(tf.zeros(shape= [10]))

设置优化器的参数

optim_C = tf.optimizers.Adam(learning_rate = 0.001)

开始训练和迭代模型

#迭代数为2000
iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
#书写训练模型的迭代
for i in range(iterations):
	#利用上次的one_step_train来计算train的 loss
	train_loss = one_step_train(X_train_reshape,y_train_encode, optim = optim_C)
	#开始一步一步写valid的loss,因为需要把上一步更新过的参数来计算valid loss
	valid_logits = feedforward(X_valid_reshape)
	#计算所有samlpes的loss值
	valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,logits = valid_logits)
	#计算平均loss值作为valid_loss
	valid_loss = tf.reduce_mean(valid_cross_en)
	if ii % 100 == 0:
		print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(i, train_loss,valid_loss))
	train_loss_ls_C.append(train_loss)
	valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

总结

第一个模型A，可以看出train loss在摇摆，时大时小，证明了模型A并不是很适合来训练data。
第二个模型B，loss反而上升来，并没有对模型进行改进。
第三模型C，可以明显看到loss下降。所以模型C是最适合的。

（6）参数W初始值的比较

由上面可知，Adam模型是最适合的。现在来比较初始值W选取对loss的影响。

Option A: 初始值W设置为0.001
Option B: 用标准正太分布来随意设置初始值W (mean = 0, sd = 1)
Option C: 用正太分布来随意设置初始值 W (mean=0,sd = 0.1)

Option A：初始值W设置为0.001

#初始值W设置为0.001
W_1 = tf.Variable(tf.ones(shape=[784,128])/1000)
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.ones(shape= [128,10])/1000)
b_2 = tf.Variable(tf.zeros(shape= [10]))
#选择Adam优化器
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 

iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                             logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
      print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

Option B：用标准正太分布来随意设置初始值W

#用标准正太分布来随意设置初始值W
W_1 = tf.Variable(tf.random.normal(shape = [784,128],mean=0,stddev=1,dtype=tf.float32 ,seed=2020))
b_1 = tf.Variable(tf.zeros(shape= [128]))
#用标准正太分布来随意设置初始值W
W_2 = tf.Variable(tf.random.normal(shape= [128,10],mean=0,stddev=1,dtype=tf.float32 ,seed=2020))
b_2 = tf.Variable(tf.zeros(shape= [10]))

#hAdam优化器
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 

iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                              logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
      print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

Option C: 用正太分布来随意设置初始值 W (mean=0,sd = 0.1)

W_1 = tf.Variable(tf.random.normal(shape = [784,128],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.random.normal(shape= [128,10],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_2 = tf.Variable(tf.zeros(shape= [10]))

#优化器Adam
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 
#start the training process
iterations = 2001
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                              logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
       print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

总结，从loss明显能看得出来，第三种设置初始值W的效果是最好的。

（7）combine 最好的优化器和最好的初始值W

自然我们是选取Adam优化器和正态分布设置初始值W。从loss值看得出来，valid_loss没有很大的提高after 700 epoch。
所以，我们选取epoch为700作迭代值。

#用正太分布来随意设置初始值 W (mean=0,sd = 0.1)
W_1 = tf.Variable(tf.random.normal(shape = [784,128],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_1 = tf.Variable(tf.zeros(shape= [128]))
W_2 = tf.Variable(tf.random.normal(shape= [128,10],mean=0,stddev=0.1,dtype=tf.float32 ,seed=2020))
b_2 = tf.Variable(tf.zeros(shape= [10]))

#优化器Adam
optim_C = tf.optimizers.Adam(learning_rate = 0.001) 
#设置迭代值为700个
iterations = 701
#store the training losses
train_loss_ls_C = []
#store the validation losses 
valid_loss_ls_C = []
for ii in range(iterations):
    train_loss = one_step_train(X_train_reshape,y_train_encode,optim = optim_C)
    #after update the weight check the validation result
    valid_logits = feedforward(X_valid_reshape)
    valid_cross_en = tf.nn.softmax_cross_entropy_with_logits(labels = y_valid_encode,
                                                              logits = valid_logits)
    #compute validation loss 
    valid_loss = tf.reduce_mean(valid_cross_en)
    # Printing out the training progress per 100 epochs
    if ii % 100 == 0:
      print('step {0:3d}: training   loss {1:.4f} \n\t  validation loss {2:.4f}'.format(ii, train_loss,valid_loss))
    train_loss_ls_C.append(train_loss)
    valid_loss_ls_C.append(valid_loss)

在这里插入图片描述

可视化

fig, ax = plt.subplots(figsize=(8,4))
ax.plot(train_loss_ls_C, label='Train Loss')
ax.plot(valid_loss_ls_C, label='Valid Loss')
ax.set_xlabel("Epoch #")
ax.set_ylabel("Loss")
ax.legend();

在这里插入图片描述

（8）最后一步，计算accuracy on test data

#function defined in class
def accuracy(logits, y_true):
    correct_prediction = tf.equal(tf.argmax(logits, axis=1), tf.argmax(y_true, axis=1))
    return tf.reduce_mean(tf.cast(correct_prediction, dtype=tf.float64))
#compute the logits for testing data
test_logits = feedforward(X_test_reshape)
print(f'Testing Accuracy: {accuracy(test_logits,y_test_encode)*100:.2f} %')

Testing Accuracy: 97.06

【这里需要解释的一点是，为什么没有用softmax函数计算每个sample10个概率，然后再取argmax()最大值的位置作为预测值，这是因为softmax是递增函数，直接用之前的的array中最大值的位置来来作为预测值也是一样
即tf.argmax(logits, axis=1) = tf.argmax(softmax(logits, axis=1))】

数据集data介绍

Load data by using TF

（1）Data Preprocessing

对feature set X进行处理（扁平化+normalized）

对y值进行one hot 处理

（2）模型的设置

参数的初始值设置

定义模型的函数

开始训练模型Using TF

解释tf内置函数

计算这两个samples的里面每个的softmax值,shape为(2,10)

计算loss值

（3）模型的迭代

（4）把train loss和valid loss 可视化

（5）模型的比较

实践模型B

tf.keras.optimizers.SGD with learning rate = 0.001

实践模型C

tf.keras.optimizers.Adam with learning rate = 0.001

总结

（6）参数W初始值的比较

Option A：初始值W设置为0.001

Option B：用标准正太分布来随意设置初始值W

Option C: 用正太分布来随意设置初始值 W (mean=0,sd = 0.1)

总结，从loss明显能看得出来，第三种设置初始值W的效果是最好的。

（7）combine 最好的优化器和最好的初始值W

可视化

（8）最后一步，计算accuracy on test data

猜你喜欢