从Logistic Regression 到 Neural Network

Logistic Regression to Neural Network

  • Logistic Regression 可以看作是一个没有隐藏层的 Neural Network
  • Neural Network 可以看作是有多个Logistic Regression 堆叠而成的
    这里写图片描述
    1. Logistic Regression
    1. Neural Network with one hidden layer

1.Logistic Regression

这里写图片描述
上图是一个Logistic Regression模型的结构图

  • x : 模型的输入,是一个样本的特征向量,shape=(3,1)
  • w : 连接输入和输出之间的权值矩阵,shape=(3,1)
  • b :偏置,shape(1,)
  • a :模型的最终输出,要经过非线性函数 σ 的'激活'
  • σ : σ = 1 1 + e ( w x + b )

1.1 Forward propagate

(1) z = w T x + b

(2) y ^ = a = σ ( z )

1.2 Object function(cost function)

(3) J = ( y l o g ( y ^ ) + ( 1 y ) l o g ( 1 y ^ ) )

1.3 Backward propagate

1, d w = J a . a z . z w = ?

2, d b = J a . a z = ?

  • J a = y a + 1 y 1 a = a y a ( 1 a )
  • a z = a ( 1 a )
  • z w = x
    (4) d w = ( a y ) x

    (5) d b = a y

    (6) w = w α d w

    (7) b = b = α d b

1.4 Explanation Cost Function

关于cost function 我们约定: y ^ = p ( y = 1 | x )

  • y ^ : 在给定训练样本 x 的条件下,y = 1的概率, 1 y ^ : y = 0的概率
  • y = 1 : p ( y | x ) = y ^
  • y = 0 : p ( y | x ) = 1 y ^

在2分类问题当中, p ( y | x ) ,包含两种情况 y = 0 或 y = 1,所以两个条件概率合并如下:

(8) p ( y | x ) = y ^ y ( 1 y ^ ) ( 1 y )

  • y = 1 y ^ y = y ^ ( 1 y ^ ) ( 1 y ) = 1 p ( y | x ) = y ^
  • y = 0 y ^ y = 1 ( 1 y ^ ) ( 1 y ) = 1 y ^ p ( y | x ) = 1 y ^
  • l o g ( ) 是严格的单调递增函数,最大化 l o g ( p ( y | x ) ) 等价与最大化 p ( y | x ) .
    (9) l o g ( p ( y | x ) ) = l o g ( y ^ y ( 1 y ^ ) ( 1 y ) )
  • 化简后: y l o g ( y ^ ) + ( 1 y ) l o g ( 1 y ^ )

J = L ( y ^ , y )

  • 加负号的原因:

    扫描二维码关注公众号,回复: 1439867 查看本文章
    • 训练模型时需要输出的概率值最大
    • 逻辑回归中要最小化损失函数

2. Neural Network

这里写图片描述
上图所示,是一个三层的网络结构,输入层神经元数量为 n [ 0 ] ,隐藏层为 n [ 1 ] ,输出层为 n [ 2 ]

  • n [ 0 ] : 3
  • n [ 1 ] : 3
  • n [ 2 ] : 1
  • x = a [ 0 ] : 输入样本的特征向量 a [ 0 ] ,shape=( n [ 0 ] ,1)
  • w [ 1 ] : 连接输入层和隐藏层之间的权值矩阵,shape=( n [ 0 ] , n [ 1 ] )
  • b [ 1 ] : 偏置,可以是一个标量,numpy中的'广播'机制会传递给矩阵或向量中的每一个元素,shape=(1,)
  • a [ 1 ] : 隐藏层的输出,shape=( n [ 1 ] , 1)
  • w [ 2 ] : 连接隐藏层和输出层之间的权值矩阵,shape=( n [ 1 ] , n [ 2 ] )
  • b [ 2 ] : 偏置
  • a [ 2 ] : 输出层的输出
  • σ : σ = 1 1 + e ( w x + b )

2.1 Forward propagate

x 为一个样本的特征向量,shape=(3, 1)
(1) z [ 1 ] = w [ 1 ] T x + b [ 1 ]
(2) a [ 1 ] = σ ( z [ 1 ] )
(3) z [ 2 ] = w [ 2 ] T a [ 1 ] + b [ 2 ]
(4) a [ 2 ] = y ^ = σ ( z [ 2 ] )
(5) J = ( y l o g ( a [ 2 ] ) + ( 1 y ) l o g ( 1 a [ 2 ] ) )

2.2 Backward propagate

  • 计算梯度

d w [ 2 ] = J a [ 2 ] . a [ 2 ] z [ 2 ] . z [ 2 ] w [ 2 ]

d b [ 2 ] = J a [ 2 ] . a [ 2 ] z [ 2 ]

J a [ 2 ] = y a [ 2 ] + 1 y 1 a [ 2 ] = a [ 2 ] y a [ 2 ] ( 1 a [ 2 ] )

a [ 2 ] z [ 2 ] = a [ 2 ] ( 1 a [ 2 ] )

z [ 2 ] w [ 2 ] = a [ 1 ]

(6) d w [ 2 ] = a [ 2 ] y a [ 2 ] ( 1 a [ 2 ] ) . a [ 2 ] ( 1 a [ 2 ] ) . a [ 1 ] = ( a [ 2 ] y ) a [ 1 ]

(7) d b [ 2 ] = a [ 2 ] y a [ 2 ] ( 1 a [ 2 ] ) . a [ 2 ] ( 1 a [ 2 ] ) = a [ 2 ] y

d w [ 1 ] = J a [ 2 ] . a [ 2 ] z [ 2 ] . z [ 2 ] a [ 1 ] . a [ 1 ] z [ 1 ] . z [ 1 ] w [ 1 ]

d b [ 1 ] = J a [ 2 ] . a [ 2 ] z [ 2 ] . z [ 2 ] a [ 1 ] . a [ 1 ] z [ 1 ]

z [ 2 ] a [ 1 ] = w [ 2 ]

a [ 1 ] z [ 1 ] = a [ 1 ] ( 1 a [ 1 ] )

z [ 1 ] w [ 1 ] = x

(8) d w [ 1 ] = x ( ( ( a [ 2 ] y ) w [ 2 ] ) ( a [ 1 ] ( 1 a [ 1 ] ) ) ) T

(9) d b [ 1 ] = a [ 2 ] y a [ 2 ] ( 1 a [ 2 ] ) . a [ 2 ] ( 1 a [ 2 ] ) . w [ 2 ] . a [ 1 ] ( 1 a [ 1 ] ) = ( ( a [ 2 ] y ) w [ 2 ] ) ( a [ 1 ] ( 1 a [ 1 ] ) )

  • 更新权值

(10) w [ 1 ] = w [ 1 ] α d w [ 1 ]
(11) b [ 1 ] = b [ 1 ] α d b [ 1 ]
(12) w [ 2 ] = w [ 2 ] α d w [ 2 ]
(13) b [ 2 ] = b [ 2 ] α d b [ 2 ]

猜你喜欢

转载自blog.csdn.net/u014281392/article/details/80296520