深层神经网络参数初始化方式对训练精度的影响

本文是基于吴恩达老师《深度学习》第二周第一课练习题所做,目的在于探究参数初始化对模型精度的影响。

文中所用到的辅助程序在这里

一、数据处理

本文所用第三方库如下,其中init_utils为辅助程序包含构建神经网络的函数。

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
from init_utils import *

通过init_utils中的load_dataset函数,可以直观看到本文所用到的数据样貌:

plt.rcParams['figure.figsize'] = (7.0, 4.0) 
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

train_X, train_Y, test_X, test_Y = load_dataset()
二、模型搭建

本文所搭建的神经网络模型与上一篇文章一致,不再赘述。

def model(X, Y, learning_rate =0.01, num_iterations = 15000,
          print_cost = True,intialization = "he"):

    grads = {}
    costs = []
    m = X.shape[1]
    layers_dims = [X.shape[0],10, 5, 1]


    if intialization == "zeros":
        parameters = initialize_parameters_zeros(layers_dims)

    elif intialization == "random":
        parameters = initialize_parameters_random(layers_dims)

    elif intialization == "he":
        parameters = initialize_parameters_he(layers_dims)

    for i in range(0, num_iterations):

        a3, cache = forward_propagation(X, parameters)

        cost = compute_loss(a3, Y)

        grads = backward_propagation(X, Y, cache)

        parameters = update_parameters(parameters, grads, learning_rate)

        if print_cost and i % 100 == 0:
            print("cost after iterations {}:{}".format(i, cost))
            costs.append(cost)

    plt.plot(costs)
    plt.xlabel("iteration (per hundreds)")
    plt.ylabel("cost")
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()

    return parameters

三、零矩阵初始化

def initialize_parameters_zeros(layers_dims):

    parameters = {}
    L = len(layers_dims)

    for l in range(1,L):
        parameters["W" + str(l)] = np.zeros((layers_dims[l],layers_dims[l-1]))
        
        parameters["b" + str(l)] = np.zeros((layers_dims[l],1))

    return parameters

如果我们使用零矩阵(np.zeros())来初始W,在之后的一系列运算中所得到的W[l]都将为0,我们来测试下这样会得到怎么的训练结果。

parameters = model(train_X, train_Y, intialization = "zeros")

print("on the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("on the test set:")
predictions_test = predict(test_X, test_Y, parameters)

运行过程中我们会发现cost的计算速度很快,这也是因为使用零矩阵进行初始化的结果。

cost after iterations 0:0.6931471805599453
cost after iterations 1000:0.6931471805599453
cost after iterations 2000:0.6931471805599453
cost after iterations 3000:0.6931471805599453
cost after iterations 4000:0.6931471805599453
cost after iterations 5000:0.6931471805599453
cost after iterations 6000:0.6931471805599453
cost after iterations 7000:0.6931471805599453
cost after iterations 8000:0.6931471805599453
cost after iterations 9000:0.6931471805599453
cost after iterations 10000:0.6931471805599455
cost after iterations 11000:0.6931471805599453
cost after iterations 12000:0.6931471805599453
cost after iterations 13000:0.6931471805599453
cost after iterations 14000:0.6931471805599453

on the train set:
Accuracy: 0.5
on the test set:
Accuracy: 0.5

使用该模型对样本进行分类,结果如下图,把整个数据集都预测为0,划分为一类。


有此可知:(1)b[l]可以初始化为0;(2)W[l]必须随即初始化。

四、随即初始化

def initialize_parameters_random(layers_dims):
    np.random.seed(3)
    parameters = {}
    L = len(layers_dims)

    for l in range(1,L):
        parameters["W" + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * 10
        parameters["b" + str(l)] = np.zeros((layers_dims[l],1))

    return parameters

使用np.random.randn()函数进行初始化,且在此处测试中我们给参W[l]设置一个较大的系数,如:10。我们来看一下运行结果。

parameters = model(train_X, train_Y, intialization = "random")

cost after iterations 0:inf
cost after iterations 1000:0.6239567039908781
cost after iterations 2000:0.5978043872838292
cost after iterations 3000:0.563595830364618
cost after iterations 4000:0.5500816882570866
cost after iterations 5000:0.5443417928662615
cost after iterations 6000:0.5373553777823036
cost after iterations 7000:0.4700141958024487
cost after iterations 8000:0.3976617665785177
cost after iterations 9000:0.39344405717719166
cost after iterations 10000:0.39201765232720626
cost after iterations 11000:0.38910685278803786
cost after iterations 12000:0.38612995897697244
cost after iterations 13000:0.3849735792031832
cost after iterations 14000:0.38275100578285265
on the train set:
Accuracy: 0.83
on the test set:
Accuracy: 0.86

训练精度明显高很多,分类结果如下:

plt.title("Model with random initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x:predict_dec(parameters, x.T), train_X, train_Y)


从运行结果中我们可以发现,使用较大权重随即初始化的效果虽然比零初始化效果好,但是还是有大片区域预测错误,那么下面我们使用较小的权重进行实验。

parameters["W" + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * 0.01
cost after iterations 0:0.6931473549048267
cost after iterations 1000:0.6931473504885134
cost after iterations 2000:0.6931473468317759
cost after iterations 3000:0.6931473432446151
cost after iterations 4000:0.6931473396813665
cost after iterations 5000:0.6931473361903396
cost after iterations 6000:0.6931473327310922
cost after iterations 7000:0.6931473293491259
cost after iterations 8000:0.6931473260187053
cost after iterations 9000:0.6931473227372426
cost after iterations 10000:0.6931473195009528
cost after iterations 11000:0.6931473163278133
cost after iterations 12000:0.693147312959552
cost after iterations 13000:0.6931473097541097
cost after iterations 14000:0.6931473065831708
on the train set:
Accuracy: 0.4633333333333333
on the test set:
Accuracy: 0.48


不难发现,训练结果更差,那么到底使用怎么的初始权重菜才能收到较好的训练效果呢?

五、he初始化

该方法的思路解决梯度消失或梯度爆炸问题,在W初始化时引入系数np.sqrt(2./layers_dims[l-1])。

注:对于relu激活函数,引入系数np.sqrt(2./layers_dims[l-1]);对于tanh激活函数,引入系数np.sqrt(1./layers_dims[l-1])

def initialize_parameters_he(layers_dims):
    np.random.seed(3)
    parameters = {}
    L = len(layers_dims)

    for l in range(1,L):
        parameters["W" + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1])
        parameters["b" + str(l)] = np.zeros((layers_dims[l],1))

    return parameters

我们来看一下运行结果。

parameters = model(train_X, train_Y, intialization = "he")
parameters = model(train_X, train_Y, intialization = "he")

print("on the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("on the test set:")
predictions_test = predict(test_X, test_Y, parameters)
cost after iterations 0:0.8830537463419761
cost after iterations 1000:0.6879825919728063
cost after iterations 2000:0.6751286264523371
cost after iterations 3000:0.6526117768893807
cost after iterations 4000:0.6082958970572938
cost after iterations 5000:0.5304944491717495
cost after iterations 6000:0.4138645817071794
cost after iterations 7000:0.3117803464844441
cost after iterations 8000:0.23696215330322562
cost after iterations 9000:0.18597287209206836
cost after iterations 10000:0.15015556280371817
cost after iterations 11000:0.12325079292273552
cost after iterations 12000:0.09917746546525932
cost after iterations 13000:0.08457055954024274
cost after iterations 14000:0.07357895962677362

on the train set:
Accuracy: 0.9933333333333333
on the test set:
Accuracy: 0.96

从训练结果可以看出cost的下降速录比较快,预测精度的非常高。


六、总结

1. 不同的初始化方式会导致完全不同的测试效果

2.过大、过小较大的权重进行初始化,运行效果都不理想

3.根据激活函数的不同,选择适当的he系数


猜你喜欢

转载自blog.csdn.net/u013093426/article/details/80931422