李飞飞计算机视觉课CS231n第五天

计算图

在这里插入图片描述用计算图来表示任何函数,其中图的节点表示我们要执行的每一步计算。如上图的线性分类器中,输入是 x x W W * 表示矩阵乘法,即 W x W*x ,输出得分向量。另一个节点表示 hinge loss,计算数据损失项 L i L_{i} ,还有一个正则项,在右下角。在最后的总的损失 L L ,是正则项和数据项的和。
画出计算图后,可以用链式求导法则得到每个节点的梯度。

(x+y)z的链式求导公式

f ( x , y , z ) = ( x + y ) z f_{(x, y, z)}=(x+y)z q ( x , y ) = x + y q_{(x, y)}=x+y ,则
f x = f q × q x = z × 1 = z \frac{\partial f}{\partial x}=\frac{\partial f}{\partial q}\times \frac{\partial q}{\partial x}=z\times 1=z
f y = f q × q y = z × 1 = z \frac{\partial f}{\partial y}=\frac{\partial f}{\partial q}\times \frac{\partial q}{\partial y}=z\times 1=z
f z = q = x + y \frac{\partial f}{\partial z}=q=x+y

反向求梯度的例子

在这里插入图片描述正向传播计算图如图所示,反向传播过程为:
开始第一个梯度为1。
f ( x ) = 1 x f_{(x)}=\frac{1}{x} ,则求导得 f ( x ) = 1 x 2 f_{(x)}^{'}=-\frac{1}{x^{2}} ,将 x = 1.37 x=1.37 代入得 f ( x ) = 0.53 f_{(x)}^{'}=-0.53 ,故其梯度为 0.53 × 1 = 0.53 -0.53\times 1=-0.53
f ( x ) = x + 1 f_{(x)}=x+1 ,则求导得 f ( x ) = 1 f_{(x)}^{'}=1 ,故其梯度为 1 × 0.53 = 0.53 1\times -0.53=-0.53
f ( x ) = e x f_{(x)}=e^{x} ,则求导得 f ( x ) = e x f_{(x)}^{'}=e^{x} ,将 x = 1 x=-1 代入得 f ( x ) = 0.37 f_{(x)}^{'}=0.37 ,故其梯度为 0.37 × 0.53 = 0.2 0.37\times -0.53=-0.2
以此类推,得到所有梯度为:
在这里插入图片描述上图中画框的地方其实是 s i g m o i d sigmoid 函数,可以不用一步一步地从开始求解到0.20处,直接用 s i g m o i d sigmoid 求导得到梯度。

sigmoid求导

σ ( x ) = 1 1 + e x \sigma_{(x)}=\frac{1}{1+e^{-x}}
d σ ( x ) d x = e x ( 1 + e x ) 2 = ( 1 + e x 1 1 + e x ) ( 1 1 + e x ) = ( 1 σ ( x ) ) σ ( x ) \frac{d\sigma_{(x)}}{dx}=\frac{e^{-x}}{(1+e^{-x})^{2}}=(\frac{1+e^{-x}-1}{1+e^{-x}})(\frac{1}{1+e^{-x}})=(1-\sigma_{(x)})\sigma_{(x)}

向量的反向传播

在这里插入图片描述如上图所示,对 f ( q i ) f_{(q_{i})} 求导,得到 f q i = 2 q i \frac{\partial f}{\partial q_{i}}=2q_{i} ,即反向求导后得到梯度
[ 0.44 0.52 ] \begin{bmatrix} 0.44 \\ 0.52 \\ \end{bmatrix}
q 1 q_{1} (即 W 1 , 1 x 1 + W 1 , 2 x 2 W_{1, 1}x_{1}+W_{1, 2}x_{2} )对 W 1 , 1 W_{1, 1} 求导,得 q 1 W 1 , 1 = x 1 = 0.2 \frac{\partial q_{1}}{\partial W_{1, 1}}=x_{1}=0.2
q 1 q_{1} W 1 , 2 W_{1, 2} 求导,得 q 1 W 1 , 2 = x 2 = 0.4 \frac{\partial q_{1}}{\partial W_{1, 2}}=x_{2}=0.4
q 1 q_{1} W 2 , 1 W_{2, 1} 求导,得 q 1 W 2 , 1 = 0 \frac{\partial q_{1}}{\partial W_{2, 1}}=0
q 1 q_{1} W 2 , 2 W_{2, 2} 求导,得 q 1 W 2 , 2 = 0 \frac{\partial q_{1}}{\partial W_{2, 2}}=0
同理, q 2 W 1 , 1 = 0 \frac{\partial q_{2}}{\partial W_{1, 1}}=0 q 2 W 1 , 2 = 0 \frac{\partial q_{2}}{\partial W_{1, 2}}=0 q 2 W 2 , 1 = x 1 = 0.2 \frac{\partial q_{2}}{\partial W_{2, 1}}=x_{1}=0.2 q 2 W 2 , 2 = x 2 = 0.4 \frac{\partial q_{2}}{\partial W_{2, 2}}=x_{2}=0.4
即:
q k W i , j = 1 k = i x j \frac{\partial q_{k}}{\partial W_{i, j}}=1_{k=i}x_{j}
其中 1 k = i 1_{k=i} 指:如果 k = i k=i ,则 1 k = i = 1 1_{k=i}=1 ,否则等于 0 0
故:
f W i , j = k f q k q k W i , j = k ( 2 q k ) ( 1 k = i x j ) = 2 q i x j \frac{\partial f}{\partial W_{i, j}}=\sum_{k}\frac{\partial f}{\partial q_{k}}\frac{\partial q_{k}}{\partial W_{i, j}}=\sum_{k}(2q_{k})(1_{k=i}x_{j})=2q_{i}x_{j}
故:
f W 1 , 1 = 2 q 1 x 1 = 0.088 \frac{\partial f}{\partial W_{1, 1}}=2q_{1}x_{1}=0.088
f W 1 , 2 = 2 q 1 x 2 = 0.176 \frac{\partial f}{\partial W_{1, 2}}=2q_{1}x_{2}=0.176
f W 2 , 1 = 2 q 2 x 1 = 0.104 \frac{\partial f}{\partial W_{2, 1}}=2q_{2}x_{1}=0.104
f W 2 , 2 = 2 q 2 x 2 = 0.208 \frac{\partial f}{\partial W_{2, 2}}=2q_{2}x_{2}=0.208
最终得到:
f W = [ 0.088 0.176 0.104 0.208 ] \frac{\partial f}{\partial W}= \begin{bmatrix} 0.088 & 0.176 \\ 0.104 & 0.208 \\ \end{bmatrix}
继续用 q 1 q_{1} x 1 x_{1} 求导,得
q 1 x 1 = W 1 , 1 = 0.1 \frac{\partial q_{1}}{\partial x_{1}}=W_{1, 1}=0.1
同理得
q 1 x 2 = W 1 , 2 = 0.5 \frac{\partial q_{1}}{\partial x_{2}}=W_{1, 2}=0.5
q 2 x 1 = W 2 , 1 = 0.3 \frac{\partial q_{2}}{\partial x_{1}}=W_{2, 1}=-0.3
q 2 x 2 = W 2 , 2 = 0.8 \frac{\partial q_{2}}{\partial x_{2}}=W_{2, 2}=0.8
即:
q k x i = W k , i \frac{\partial q_{k}}{\partial x_{i}}=W_{k, i}
f x i = k f q k q k x i = k 2 q k W k , i \frac{\partial f}{\partial x_{i}}=\sum_{k}\frac{\partial f}{\partial q_{k}}\frac{\partial q_{k}}{\partial x_{i}}=\sum_{k}2q_{k}W_{k, i}
故:
f x 1 = 2 q 1 W 1 , 1 + 2 q 2 W 2 , 1 = 0.112 \frac{\partial f}{\partial x_{1}}=2q_{1}W_{1, 1}+2q_{2}W_{2, 1}=-0.112
f x 2 = 2 q 1 W 1 , 2 + 2 q 2 W 2 , 2 = 0.636 \frac{\partial f}{\partial x_{2}}=2q_{1}W_{1, 2}+2q_{2}W_{2, 2}=0.636
故:
f x = [ 0.112 0.636 ] \frac{\partial f}{\partial x}=\begin{bmatrix} -0.112 \\ 0.636 \\ \end{bmatrix}
最终:
在这里插入图片描述

发布了36 篇原创文章 · 获赞 1 · 访问量 555

猜你喜欢

转载自blog.csdn.net/qq_36758914/article/details/103525982