Deep learning I - III Shallow Neural Network - Backpropagation intuition反向传播算法启发

Backpropagation intuition


简单的2层浅神经网络,第一层的activation function为 t a n h ( z ) ,第二层的activation function为 s i g m o i d ( z )
神经网络architecture如下图:
这里写图片描述
使用计算流图(computational graphs)表示如下图:

这里写图片描述

在下面的公式中, log a [ 2 ]   m e a n s   ln a [ 2 ] d a [ 2 ] , d z [ 2 ] 等等是标记相应的导数的符号;并且,下面的公式是单个instance的,并没有矩阵化。

(1.1) L ( a [ 2 ] , y ) = y log a [ 2 ] ( 1 y ) log ( 1 a [ 2 ] )

(1.2) d a [ 1 × 1 ] [ 2 ] = d d a [ 2 ] L ( a [ 2 ] , y ) = y a [ 2 ] + 1 y 1 a [ 2 ]

(1.3) g ( z [ 2 ] ) = s i g m o i d ( z [ 2 ] ) = a [ 2 ]

(1.4) d z [ 1 × 1 ] [ 2 ] = d d z [ 2 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) d d z [ 2 ] a [ 2 ] = d a [ 2 ] g ( z [ 2 ] ) = ( y a [ 2 ] + 1 y 1 a [ 2 ] ) ( g ( z [ 2 ] ) ( 1 g ( z [ 2 ] ) ) ) = ( y a [ 2 ] + 1 y 1 a [ 2 ] ) a [ 2 ] ( 1 a [ 2 ] ) = a [ 2 ] y

(1.5) d W [ 1 × 4 ] [ 2 ] = d d W [ 2 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) d d z [ 2 ] a [ 2 ] d d W [ 2 ] z [ 2 ] = d z [ 2 ] x = d z [ 1 × 1 ] [ 2 ] ( a [ 4 × 1 ] [ 1 ] ) T

(1.6) d b [ 1 × 1 ] [ 2 ] = d d b [ 2 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) d d z [ 2 ] a [ 2 ] d d b [ 2 ] z [ 2 ] = d z [ 1 × 1 ] [ 2 ]

(1.7) d a [ 4 × 1 ] [ 1 ] = d d a [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) d d z [ 2 ] a [ 2 ] d d a [ 1 ] z [ 2 ] = d z [ 2 ] W [ 2 ] = ( W [ 1 × 4 ] [ 2 ] ) T d z [ 1 × 1 ] [ 2 ]

(1.8) g ( z [ 1 ] ) = tanh ( z [ 1 ] ) = a [ 1 ]

(1.9) d z [ 4 × 1 ] [ 1 ] = d d z [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) d d z [ 2 ] a [ 2 ] d d a [ 1 ] z [ 2 ] d d z [ 1 ] a [ 1 ] = d a [ 1 ] g ( z [ 1 ] ) = ( W [ 1 × 4 ] [ 2 ] ) T d z [ 1 × 1 ] [ 2 ] g ( z [ 1 ] ) [ 4 × 1 ]

(1.10) d W [ 4 × 3 ] [ 1 ] = d d W [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) d d z [ 2 ] a [ 2 ] d d a [ 1 ] z [ 2 ] d d z [ 1 ] a [ 1 ] d d W [ 1 ] z [ 1 ] = d z [ 1 ] x = d z [ 4 × 1 ] [ 1 ] ( a [ 3 × 1 ] [ 0 ] ) T

(1.11) d b [ 4 × 1 ] [ 1 ] = d d W [ 1 ] L ( a [ 2 ] , y ) = d d a [ 2 ] L ( a [ 2 ] , y ) d d z [ 2 ] a [ 2 ] d d a [ 1 ] z [ 2 ] d d z [ 1 ] a [ 1 ] d d b [ 1 ] z [ 1 ] = d z [ 4 × 1 ] [ 1 ]

下面是vectorization后的反向传播算法公式:

(2.1) L ( A [ 2 ] , Y ) = 1 m i = 1 m y ( i ) log A [ 2 ] ( i ) ( 1 y ( i ) ) log ( 1 A [ 2 ] ( i ) )

(2.2) d A [ 1 × m ] [ 2 ] = [ ( Y ( 1 ) A [ 2 ] ( 1 ) + 1 Y ( 1 ) 1 A [ 2 ] ( 1 ) ) , , ( Y ( m ) A [ 2 ] ( m ) + 1 Y ( m ) 1 A [ 2 ] ( m ) ) ]

(2.3) d Z [ 1 × m ] [ 2 ] = [ ( Y ( 1 ) A [ 2 ] ( 1 ) + 1 Y ( 1 ) 1 A [ 2 ] ( 1 ) ) , , ( Y ( m ) A [ 2 ] ( m ) + 1 Y ( m ) 1 A [ 2 ] ( m ) ) ] [ A [ 2 ] ( 1 ) ( 1 A [ 2 ] ( 1 ) ) , , A [ 2 ] ( m ) ( 1 A [ 2 ] ( m ) ) ] = [ ( A [ 2 ] ( 1 ) Y ( 1 ) ) , , ( A [ 2 ] ( m ) Y ( m ) ) ] = A [ 2 ] Y

(2.4) d W [ 1 × 4 ] [ 2 ] = 1 m d Z [ 1 × m ] [ 2 ] ( A [ 4 × m ] [ 1 ] ) T

(2.5) d b [ 1 × 1 ] [ 2 ] = 1 m n p . s u m ( d Z [ 2 ] , a x i s = 1 , k e e p d i m s = T r u e )

d Z [ 4 × m ] [ 1 ] = ( W [ 1 × 4 ] [ 2 ] ) T d Z [ 1 × m ] [ 2 ] g [ 1 ] ( Z [ 1 ] ) [ 4 × m ]

(2.6) d W [ 4 × 3 ] [ 1 ] = 1 m d Z [ 4 × m ] [ 1 ] ( A [ 3 × m ] [ 0 ] ) T

(2.7) d b [ 4 × 1 ] [ 1 ] = 1 m s p . s u m ( d Z [ 4 × m ] [ 1 ] , a x i s = 1 , k e e p d i m s = T r u e )

总结

这里写图片描述

猜你喜欢

转载自blog.csdn.net/zfcjhdq/article/details/80728341