模型训练
所谓模型训练即在一批数据上运行模型后更新模型权重。我们需要做到以下几点。
(1)计算模型对图像批量的预测值。
(2)根据实际标签,计算这些预测值的损失值。
(3)计算损失相对于模型权重的梯度。
(4)将权重沿着梯度的反方向移动一小步。要计算梯度,我们需要用到TensorFlow GradientTape对象
def one_training_step(model, images_batch, labels_batch):
#(以下5行)运行前向传播,即在GradientTape作用域内计算模型预测值
with tf.GradientTape() as tape:
predictions = model(images_batch)
per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
labels_batch, predictions)
##计算整个批量上的平均损失
average_loss = tf.reduce_mean(per_sample_losses)
# 计算损失相对于权重的梯度。输出gradients是一个列表,每个元素对应model.weights列表中的权重
gradients = tape.gradient(average_loss, model.weights)
#利用梯度来更新权重
update_weights(gradients, model.weights)
return average_loss
,“更新权重”这一步(由update_weights函数实现)的目的,就是将权重沿着减小批量损失值的方向移动“一小步”。移动幅度由学习率决定,它通常是一个很小的数。要实现这个update_weights函数,最简单的方法就是从每个权重中减去gradient * learning_rate。
learning_rate = 1e-3
def update_weights(gradients, weights):
for g, w in zip(gradients, weights):
# assign_sub相当于TensorFlow变量的-=
w.assign_sub(g * learning_rate)
更好的方法是使用Keras的Optimizer实例,如下所示。
from tensorflow.keras import optimizers
optimizer = optimizers.SGD(learning_rate=1e-3)
def update_weights(gradients, weights):
optimizer.apply_gradients(zip(gradients, weights))
完整的训练循环
一轮训练就是对**训练数据的每个批量(batch)**都重复上述训练步骤,而完整的训练循环就是重复多轮训练(epochs)。
def fit(model, images, labels, epochs, batch_size=128):
for epoch_counter in range(epochs):
print(f"Epoch {
epoch_counter}")
batch_generator = BatchGenerator(images, labels)
for batch_counter in range(batch_generator.num_batches):
images_batch, labels_batch = batch_generator.next()
loss = one_training_step(model, images_batch, labels_batch)
if batch_counter % 100 == 0:
print(f"loss at batch {
batch_counter}: {
loss:.2f}")
在训练数据上训练模型
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255
fit(model, train_images, train_labels, epochs=10, batch_size=128)
运行结果如下(使用我们自己简单方法进行权重更新的结果):
Epoch 0
loss at batch 0: 4.19
loss at batch 100: 2.23
loss at batch 200: 2.23
loss at batch 300: 2.11
loss at batch 400: 2.18
Epoch 1
loss at batch 0: 1.91
loss at batch 100: 1.87
loss at batch 200: 1.85
loss at batch 300: 1.72
loss at batch 400: 1.81
Epoch 2
loss at batch 0: 1.59
loss at batch 100: 1.57
loss at batch 200: 1.52
loss at batch 300: 1.44
loss at batch 400: 1.50
Epoch 3
loss at batch 0: 1.32
loss at batch 100: 1.33
loss at batch 200: 1.26
loss at batch 300: 1.22
loss at batch 400: 1.28
Epoch 4
loss at batch 0: 1.13
loss at batch 100: 1.15
loss at batch 200: 1.06
loss at batch 300: 1.06
loss at batch 400: 1.12
Epoch 5
loss at batch 0: 0.98
loss at batch 100: 1.02
loss at batch 200: 0.92
loss at batch 300: 0.94
loss at batch 400: 1.00
Epoch 6
loss at batch 0: 0.87
loss at batch 100: 0.91
loss at batch 200: 0.82
loss at batch 300: 0.84
loss at batch 400: 0.91
Epoch 7
loss at batch 0: 0.79
loss at batch 100: 0.83
loss at batch 200: 0.74
loss at batch 300: 0.77
loss at batch 400: 0.84
Epoch 8
loss at batch 0: 0.73
loss at batch 100: 0.76
loss at batch 200: 0.67
loss at batch 300: 0.72
loss at batch 400: 0.79
Epoch 9
loss at batch 0: 0.68
loss at batch 100: 0.71
loss at batch 200: 0.62
loss at batch 300: 0.67
loss at batch 400: 0.75
使用keras的优化器结果如下,是不是有点反直觉,效果竟然和简单优化器效果差不多:
Epoch 0
loss at batch 0: 4.69
loss at batch 100: 2.23
loss at batch 200: 2.22
loss at batch 300: 2.08
loss at batch 400: 2.20
Epoch 1
loss at batch 0: 1.92
loss at batch 100: 1.86
loss at batch 200: 1.83
loss at batch 300: 1.69
loss at batch 400: 1.81
Epoch 2
loss at batch 0: 1.60
loss at batch 100: 1.56
loss at batch 200: 1.51
loss at batch 300: 1.40
loss at batch 400: 1.50
Epoch 3
loss at batch 0: 1.34
loss at batch 100: 1.32
loss at batch 200: 1.25
loss at batch 300: 1.18
loss at batch 400: 1.27
Epoch 4
loss at batch 0: 1.14
loss at batch 100: 1.14
loss at batch 200: 1.06
loss at batch 300: 1.03
loss at batch 400: 1.11
Epoch 5
loss at batch 0: 0.99
loss at batch 100: 1.00
loss at batch 200: 0.91
loss at batch 300: 0.91
loss at batch 400: 0.99
Epoch 6
loss at batch 0: 0.88
loss at batch 100: 0.90
loss at batch 200: 0.81
loss at batch 300: 0.82
loss at batch 400: 0.90
Epoch 7
loss at batch 0: 0.80
loss at batch 100: 0.82
loss at batch 200: 0.73
loss at batch 300: 0.75
loss at batch 400: 0.84
Epoch 8
loss at batch 0: 0.74
loss at batch 100: 0.75
loss at batch 200: 0.67
loss at batch 300: 0.70
loss at batch 400: 0.78
Epoch 9
loss at batch 0: 0.69
loss at batch 100: 0.69
loss at batch 200: 0.62
loss at batch 300: 0.66
loss at batch 400: 0.74
评估模型
评估模型的方法是对模型在测试图像上的预测值取argmax,并将其与预期标签进行比较。
predictions = model(test_images)
#对TensorFlow张量调用.numpy(),可将其转换为NumPy张量
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {
matches.mean():.2f}")
完整可运行的代码贴出,大家感兴趣可以去跑一下:
import tensorflow as tf
class NaiveDense:
#构造函数
def __init__(self, input_size, output_size, activation):
#模拟keras的dense层可以设置激活函数
self.activation = activation
w_shape = (input_size, output_size)
#创建一个形状为(input_size, output_size)的矩阵W,并将其随机初始化
w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
self.W = tf.Variable(w_initial_value)
b_shape = (output_size,)
#创建一个形状为(output_size,)的零向量b
b_initial_value = tf.zeros(b_shape)
self.b = tf.Variable(b_initial_value)
#前向传播
def __call__(self, inputs):
return self.activation(tf.matmul(inputs, self.W) + self.b)
#获取该层权重的便捷方法
@property
def weights(self):
#以列表的形式返回本层的权重
return [self.W, self.b]
class NaiveSequential:
def __init__(self, layers):
self.layers = layers
def __call__(self, inputs):
x = inputs
for layer in self.layers:
x = layer(x)
return x
@property
def weights(self):
weights = []
for layer in self.layers:
weights += layer.weights
return weights
model = NaiveSequential([
NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
])
assert len(model.weights) == 4
import math
class BatchGenerator:
def __init__(self, images, labels, batch_size=128):
assert len(images) == len(labels)
self.index = 0
self.images = images
self.labels = labels
self.batch_size = batch_size
self.num_batches = math.ceil(len(images) / batch_size)
def next(self):
images = self.images[self.index : self.index + self.batch_size]
labels = self.labels[self.index : self.index + self.batch_size]
self.index += self.batch_size
return images, labels
#learning_rate = 1e-3
#def update_weights(gradients, weights):
# for g, w in zip(gradients, weights):
# assign_sub相当于TensorFlow变量的-=
# w.assign_sub(g * learning_rate)
from tensorflow.keras import optimizers
optimizer = optimizers.SGD(learning_rate=1e-3)
def update_weights(gradients, weights):
optimizer.apply_gradients(zip(gradients, weights))
def one_training_step(model, images_batch, labels_batch):
#(以下5行)运行前向传播,即在GradientTape作用域内计算模型预测值
with tf.GradientTape() as tape:
predictions = model(images_batch)
per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
labels_batch, predictions)
#计算整个批量上的平均损失
average_loss = tf.reduce_mean(per_sample_losses)
# 计算损失相对于权重的梯度。输出gradients是一个列表,每个元素对应model.weights列表中的权重
gradients = tape.gradient(average_loss, model.weights)
#利用梯度来更新权重
update_weights(gradients, model.weights)
return average_loss
def fit(model, images, labels, epochs, batch_size=128):
for epoch_counter in range(epochs):
print(f"Epoch {
epoch_counter}")
batch_generator = BatchGenerator(images, labels)
for batch_counter in range(batch_generator.num_batches):
images_batch, labels_batch = batch_generator.next()
loss = one_training_step(model, images_batch, labels_batch)
if batch_counter % 100 == 0:
print(f"loss at batch {
batch_counter}: {
loss:.2f}")
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255
fit(model, train_images, train_labels, epochs=10, batch_size=128)
import numpy as np
predictions = model(test_images)
#对TensorFlow张量调用.numpy(),可将其转换为NumPy张量
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {
matches.mean():.2f}")
总结
张量构成了现代机器学习系统的基石。它具有不同的dtype(数据类型)、rank(阶)、shape(形状)等。可以通过张量运算(比如加法、张量积或逐元素乘法)对数值张量进行操作。这些运算可看作几何变换。一般来说,深度学习的所有内容都有几何解释。深度学习模型由简单的张量运算链接而成,它以权重为参数,权重就是张量。模型权重保存的是模型所学到的“知识”。学习是指找到一组模型参数值,使模型在一组给定的训练数据样本及其对应目标值上的损失函数最小化。学习的过程:随机选取包含数据样本及其目标值的批量,计算批量损失相对于模型参数的梯度。随后将模型参数沿着梯度的反方向移动一小步(移动距离由学习率决定)。这个过程叫作小批量随机梯度下降。整个学习过程之所以能够实现,是因为神经网络中所有张量运算都是可微的。因此,可以利用求导链式法则来得到梯度函数。这个函数将当前参数和当前数据批量映射为一个梯度值。这一步叫作反向传播。你会经常遇到两个重要概念:损失和优化器。在将数据输入模型之前,你需要先对这二者进行定义。损失是在训练过程中需要最小化的量。它衡量的是当前任务是否已成功解决。优化器是利用损失梯度对参数进行更新的具体方式,比如RMSprop优化器、带动量的随机梯度下降(SGD)等。