一、学习率的影响 点击此处返回总目录 二、指数衰减学习率
一、学习率的影响 这一节说一下学习率,学习率决定了参数每次更新的幅度。在训练中,参数的更新是遵照下面这个公式的。 下一时刻参数等于当前时刻参数减去学习率乘以损失函数的导数。也就是,参数的更新向着损失函数梯度下降的方向。
扫描二维码关注公众号,回复:
5506808 查看本文章
我们来举个例子,看一下学习率是怎么影响参数更新的。 损失函数为loss,梯度为2w+2。 参数初始化赋值为5,学习率设为0.2。 第一次更新时,参数为5,按照上面的更新公式,得到2.6;第二次更新后为1.16;第三次更新后为0.296... 我们画出损失函数loss的图像,为右图。优化参数w,就是要找到某个参数w,是的损失函数的梯度最小。从图上我们可以直观的看到,损失函数梯度最小的点在-1这个位置。我们优化w的目的就是为了找到w=-1的这个点。 那么程序是否可以找到这个点呢?我们看一下代码。
#coding:utf-8 import tensorflow as tf W=tf.Variable(tf.constant(5,dtype=tf.float32)) #设置W的初值为5 loss=tf.square(W+1) #设定损失函数loss train_step =tf.train.GradientDescentOptimizer(0.2).minimize(loss) #梯度下降,学习率为0.2 with tf.Session() as sess: init_op = tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) w_val = sess.run(W) loss_val = sess.run(loss) print("After %d steps,W is %f ,loss is %f" % (i,w_val,loss_val)) |
运行结果: After 0 steps,W is 2.600000 ,loss is 12.959999 After 1 steps,W is 1.160000 ,loss is 4.665599 After 2 steps,W is 0.296000 ,loss is 1.679616 After 3 steps,W is -0.222400 ,loss is 0.604662 After 4 steps,W is -0.533440 ,loss is 0.217678 After 5 steps,W is -0.720064 ,loss is 0.078364 After 6 steps,W is -0.832038 ,loss is 0.028211 After 7 steps,W is -0.899223 ,loss is 0.010156 After 8 steps,W is -0.939534 ,loss is 0.003656 After 9 steps,W is -0.963720 ,loss is 0.001316 After 10 steps,W is -0.978232 ,loss is 0.000474 After 11 steps,W is -0.986939 ,loss is 0.000171 After 12 steps,W is -0.992164 ,loss is 0.000061 After 13 steps,W is -0.995298 ,loss is 0.000022 After 14 steps,W is -0.997179 ,loss is 0.000008 After 15 steps,W is -0.998307 ,loss is 0.000003 After 16 steps,W is -0.998984 ,loss is 0.000001 After 17 steps,W is -0.999391 ,loss is 0.000000 After 18 steps,W is -0.999634 ,loss is 0.000000 After 19 steps,W is -0.999781 ,loss is 0.000000 After 20 steps,W is -0.999868 ,loss is 0.000000 After 21 steps,W is -0.999921 ,loss is 0.000000 After 22 steps,W is -0.999953 ,loss is 0.000000 After 23 steps,W is -0.999972 ,loss is 0.000000 After 24 steps,W is -0.999983 ,loss is 0.000000 After 25 steps,W is -0.999990 ,loss is 0.000000 After 26 steps,W is -0.999994 ,loss is 0.000000 After 27 steps,W is -0.999996 ,loss is 0.000000 After 28 steps,W is -0.999998 ,loss is 0.000000 After 29 steps,W is -0.999999 ,loss is 0.000000 After 30 steps,W is -0.999999 ,loss is 0.000000 After 31 steps,W is -1.000000 ,loss is 0.000000 After 32 steps,W is -1.000000 ,loss is 0.000000 After 33 steps,W is -1.000000 ,loss is 0.000000 After 34 steps,W is -1.000000 ,loss is 0.000000 After 35 steps,W is -1.000000 ,loss is 0.000000 After 36 steps,W is -1.000000 ,loss is 0.000000 After 37 steps,W is -1.000000 ,loss is 0.000000 After 38 steps,W is -1.000000 ,loss is 0.000000 After 39 steps,W is -1.000000 ,loss is 0.000000 我们可以看到,随着运行,w接近-1。可见代码成功找到了最优参数w=-1。 那么我们在实际应用中,学习率设置为多少合适呢? 我们继续以上面的例子为例,将学习率设置为1,查看效果。进刚才的代码,学习率改为1,其余不变。 运行结果如下: After 0 steps,W is -7.000000 ,loss is 36.000000运行结果: After 1 steps,W is 5.000000 ,loss is 36.000000 After 2 steps,W is -7.000000 ,loss is 36.000000 After 3 steps,W is 5.000000 ,loss is 36.000000 After 4 steps,W is -7.000000 ,loss is 36.000000 After 5 steps,W is 5.000000 ,loss is 36.000000 After 6 steps,W is -7.000000 ,loss is 36.000000 After 7 steps,W is 5.000000 ,loss is 36.000000 After 8 steps,W is -7.000000 ,loss is 36.000000 After 9 steps,W is 5.000000 ,loss is 36.000000 After 10 steps,W is -7.000000 ,loss is 36.000000 After 11 steps,W is 5.000000 ,loss is 36.000000 After 12 steps,W is -7.000000 ,loss is 36.000000 After 13 steps,W is 5.000000 ,loss is 36.000000 After 14 steps,W is -7.000000 ,loss is 36.000000 After 15 steps,W is 5.000000 ,loss is 36.000000 After 16 steps,W is -7.000000 ,loss is 36.000000 After 17 steps,W is 5.000000 ,loss is 36.000000 After 18 steps,W is -7.000000 ,loss is 36.000000 After 19 steps,W is 5.000000 ,loss is 36.000000 After 20 steps,W is -7.000000 ,loss is 36.000000 After 21 steps,W is 5.000000 ,loss is 36.000000 After 22 steps,W is -7.000000 ,loss is 36.000000 After 23 steps,W is 5.000000 ,loss is 36.000000 After 24 steps,W is -7.000000 ,loss is 36.000000 After 25 steps,W is 5.000000 ,loss is 36.000000 After 26 steps,W is -7.000000 ,loss is 36.000000 After 27 steps,W is 5.000000 ,loss is 36.000000 After 28 steps,W is -7.000000 ,loss is 36.000000 After 29 steps,W is 5.000000 ,loss is 36.000000 After 30 steps,W is -7.000000 ,loss is 36.000000 After 31 steps,W is 5.000000 ,loss is 36.000000 After 32 steps,W is -7.000000 ,loss is 36.000000 After 33 steps,W is 5.000000 ,loss is 36.000000 After 34 steps,W is -7.000000 ,loss is 36.000000 After 35 steps,W is 5.000000 ,loss is 36.000000 After 36 steps,W is -7.000000 ,loss is 36.000000 After 37 steps,W is 5.000000 ,loss is 36.000000 After 38 steps,W is -7.000000 ,loss is 36.000000 After 39 steps,W is 5.000000 ,loss is 36.000000 我们发现loss并没有下降,w在5和-7之间跳动,不收敛。 再另学习率为0.0001,结果如下: After 0 steps,W is 4.998800 ,loss is 35.985600 After 1 steps,W is 4.997600 ,loss is 35.971207 After 2 steps,W is 4.996400 ,loss is 35.956818 After 3 steps,W is 4.995201 ,loss is 35.942436 After 4 steps,W is 4.994002 ,loss is 35.928059 After 5 steps,W is 4.992803 ,loss is 35.913689 After 6 steps,W is 4.991604 ,loss is 35.899323 After 7 steps,W is 4.990406 ,loss is 35.884964 After 8 steps,W is 4.989208 ,loss is 35.870609 After 9 steps,W is 4.988010 ,loss is 35.856262 After 10 steps,W is 4.986812 ,loss is 35.841919 After 11 steps,W is 4.985615 ,loss is 35.827583 After 12 steps,W is 4.984417 ,loss is 35.813251 After 13 steps,W is 4.983221 ,loss is 35.798927 After 14 steps,W is 4.982024 ,loss is 35.784607 After 15 steps,W is 4.980827 ,loss is 35.770294 After 16 steps,W is 4.979631 ,loss is 35.755985 After 17 steps,W is 4.978435 ,loss is 35.741684 After 18 steps,W is 4.977239 ,loss is 35.727386 After 19 steps,W is 4.976044 ,loss is 35.713097 After 20 steps,W is 4.974848 ,loss is 35.698811 After 21 steps,W is 4.973653 ,loss is 35.684532 After 22 steps,W is 4.972458 ,loss is 35.670258 After 23 steps,W is 4.971264 ,loss is 35.655991 After 24 steps,W is 4.970069 ,loss is 35.641727 After 25 steps,W is 4.968875 ,loss is 35.627472 After 26 steps,W is 4.967681 ,loss is 35.613220 After 27 steps,W is 4.966488 ,loss is 35.598976 After 28 steps,W is 4.965294 ,loss is 35.584736 After 29 steps,W is 4.964101 ,loss is 35.570503 After 30 steps,W is 4.962908 ,loss is 35.556274 After 31 steps,W is 4.961716 ,loss is 35.542053 After 32 steps,W is 4.960523 ,loss is 35.527836 After 33 steps,W is 4.959331 ,loss is 35.513626 After 34 steps,W is 4.958139 ,loss is 35.499420 After 35 steps,W is 4.956947 ,loss is 35.485222 After 36 steps,W is 4.955756 ,loss is 35.471027 After 37 steps,W is 4.954565 ,loss is 35.456841 After 38 steps,W is 4.953373 ,loss is 35.442654 After 39 steps,W is 4.952183 ,loss is 35.428478 可以看到下降非常缓慢。 二、指数衰减学习率 由此可见,固定的学习率,如果设置大了,会振荡不收敛;如果小了,会收敛速度慢。 于是,提出了指数衰减的学习率。指数衰减学习率根据运行BATCH_SIZE的轮数,动态更新学习率。公式如下: 其中,LEARNING_RATE_BASE是学习率基数,也就是最开始时设置的学习率。 LEARNING_RATE_DECAY是学习率衰减率。 学习率衰减率有个指数,是global_step/LEARNING_RATE_STEP。LEARN_RATE_STEP一般是总样本数除以BATCH_SIZE得到。 在TensorFlow中,我们用tf.train.exponential_decay()函数来表示。参数分别是学习率基数LEARNING_RATE_BASE,也就是最初的学习率,是个超参数;当前运行到第几轮的计数器global_step;学习率多少轮更新一次LEARNING_RATE_STEP,通常设定为输入数据集总样本数/每次喂入多少个数据;staircase为True时,global_step/LEARNING_RATE_STEP取整数,学习率阶梯性衰减。staircase为False时,学习率会是一条平滑下降的曲线。 如果在程序中使用指数衰减的学习率,只需把下面两行代码加入到程序中即可: 第一句定义一个global_step,作为计数器,记录当前共运行了多少轮BATCH_STEP个数据。由于这个变量只用于计数,并非训练的参数,我们写trainable=False,把它标注为不可训练。 第二句,学习率等于tf.train.exponential_decay()。 我们看一下代码,体会一下指数衰减学习率。
#coding:utf-8 import tensorflow as tf LEARNING_RATE_BASE = 0.1 #最初的学习率设置为0.1 LEARNING_RATE_DECAY = 0.99 #学习率衰减率 LEARNING_RATE_STEP = 2 #喂入多少轮BATCH_SIZE后,更新一次学习率。这里为了方便,设置为2。 一般设为:总样本数/BATCH_SIZE,也就是总样本中有几个BATCH_SIZE global_step = tf.Variable(0,trainable = False) #运行了多少轮的计数器,初值为0,不可训练。 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP, LEARNING_RATE_DECAY,staircase=True) #定义指数下降学习率 W=tf.Variable(tf.constant(5,dtype=tf.float32)) #定义待优化参数,初值为5 loss=tf.square(W+1) train_step =tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step) with tf.Session() as sess: init_op = tf.global_variables_initializer() sess.run(init_op) for i in range(40): sess.run(train_step) learning_rate_val = sess.run(learning_rate) global_step_val = sess.run(global_step) w_val = sess.run(W) loss_val = sess.run(loss) print("After %d steps,golbal_step is %f,W is %f,learning_rate is %f ,loss is %f" % (i,global_step_val,w_val,learning_rate_val,loss_val)) |
运行结果: After 0 steps,golbal_step is 1.000000,W is 3.800000,learning_rate is 0.100000 ,loss is 23.040001 After 1 steps,golbal_step is 2.000000,W is 2.840000,learning_rate is 0.099000 ,loss is 14.745600 After 2 steps,golbal_step is 3.000000,W is 2.079680,learning_rate is 0.099000 ,loss is 9.484428 After 3 steps,golbal_step is 4.000000,W is 1.469903,learning_rate is 0.098010 ,loss is 6.100423 After 4 steps,golbal_step is 5.000000,W is 0.985753,learning_rate is 0.098010 ,loss is 3.943214 After 5 steps,golbal_step is 6.000000,W is 0.596506,learning_rate is 0.097030 ,loss is 2.548830 After 6 steps,golbal_step is 7.000000,W is 0.286688,learning_rate is 0.097030 ,loss is 1.655566 After 7 steps,golbal_step is 8.000000,W is 0.036994,learning_rate is 0.096060 ,loss is 1.075356 After 8 steps,golbal_step is 9.000000,W is -0.162233,learning_rate is 0.096060 ,loss is 0.701854 After 9 steps,golbal_step is 10.000000,W is -0.323184,learning_rate is 0.095099 ,loss is 0.458080 After 10 steps,golbal_step is 11.000000,W is -0.451913,learning_rate is 0.095099 ,loss is 0.300399 After 11 steps,golbal_step is 12.000000,W is -0.556158,learning_rate is 0.094148 ,loss is 0.196996 After 12 steps,golbal_step is 13.000000,W is -0.639732,learning_rate is 0.094148 ,loss is 0.129793 After 13 steps,golbal_step is 14.000000,W is -0.707569,learning_rate is 0.093207 ,loss is 0.085516 After 14 steps,golbal_step is 15.000000,W is -0.762082,learning_rate is 0.093207 ,loss is 0.056605 After 15 steps,golbal_step is 16.000000,W is -0.806433,learning_rate is 0.092274 ,loss is 0.037468 After 16 steps,golbal_step is 17.000000,W is -0.842155,learning_rate is 0.092274 ,loss is 0.024915 After 17 steps,golbal_step is 18.000000,W is -0.871285,learning_rate is 0.091352 ,loss is 0.016567 After 18 steps,golbal_step is 19.000000,W is -0.894802,learning_rate is 0.091352 ,loss is 0.011067 After 19 steps,golbal_step is 20.000000,W is -0.914022,learning_rate is 0.090438 ,loss is 0.007392 After 20 steps,golbal_step is 21.000000,W is -0.929573,learning_rate is 0.090438 ,loss is 0.004960 After 21 steps,golbal_step is 22.000000,W is -0.942312,learning_rate is 0.089534 ,loss is 0.003328 After 22 steps,golbal_step is 23.000000,W is -0.952642,learning_rate is 0.089534 ,loss is 0.002243 After 23 steps,golbal_step is 24.000000,W is -0.961122,learning_rate is 0.088638 ,loss is 0.001511 After 24 steps,golbal_step is 25.000000,W is -0.968014,learning_rate is 0.088638 ,loss is 0.001023 After 25 steps,golbal_step is 26.000000,W is -0.973685,learning_rate is 0.087752 ,loss is 0.000692 After 26 steps,golbal_step is 27.000000,W is -0.978303,learning_rate is 0.087752 ,loss is 0.000471 After 27 steps,golbal_step is 28.000000,W is -0.982111,learning_rate is 0.086875 ,loss is 0.000320 After 28 steps,golbal_step is 29.000000,W is -0.985219,learning_rate is 0.086875 ,loss is 0.000218 After 29 steps,golbal_step is 30.000000,W is -0.987787,learning_rate is 0.086006 ,loss is 0.000149 After 30 steps,golbal_step is 31.000000,W is -0.989888,learning_rate is 0.086006 ,loss is 0.000102 After 31 steps,golbal_step is 32.000000,W is -0.991628,learning_rate is 0.085146 ,loss is 0.000070 After 32 steps,golbal_step is 33.000000,W is -0.993053,learning_rate is 0.085146 ,loss is 0.000048 After 33 steps,golbal_step is 34.000000,W is -0.994236,learning_rate is 0.084294 ,loss is 0.000033 After 34 steps,golbal_step is 35.000000,W is -0.995208,learning_rate is 0.084294 ,loss is 0.000023 After 35 steps,golbal_step is 36.000000,W is -0.996016,learning_rate is 0.083451 ,loss is 0.000016 After 36 steps,golbal_step is 37.000000,W is -0.996681,learning_rate is 0.083451 ,loss is 0.000011 After 37 steps,golbal_step is 38.000000,W is -0.997235,learning_rate is 0.082617 ,loss is 0.000008 After 38 steps,golbal_step is 39.000000,W is -0.997692,learning_rate is 0.082617 ,loss is 0.000005 After 39 steps,golbal_step is 40.000000,W is -0.998073,learning_rate is 0.081791 ,loss is 0.000004 可以看到,学习率一直在变化,没两轮更新一次。损失函数逐渐下降。W逐渐接近-1。 |