书接上文标量、向量与矩阵的求导https://blog.csdn.net/weixin_42764932/article/details/113107265
几个公式要牢记,机器学习算法推导用的上
向量、矩阵求导的重要公式
- 1. ∂ A x → ∂ x → = A T \frac{\partial A \overrightarrow{x}}{\partial \overrightarrow{x}}=A^{T} ∂x∂Ax=AT
- 2. ∂ A x → ∂ x → T = A \frac{\partial A \overrightarrow{x}}{\partial \overrightarrow{x}^{T}}=A ∂xT∂Ax=A
- 3. ∂ ( x → T A ) ∂ x → = A \frac{\partial(\overrightarrow{x}^{T}A)}{\partial \overrightarrow{x}}=A ∂x∂(xTA)=A
- 4. ∂ ( x → T ⋅ A ⋅ x → ) ∂ x → = ( A T + A ) ⋅ x → \frac{\partial(\overrightarrow{x}^{T}\cdot A \cdot \overrightarrow{x})}{\partial \overrightarrow{x}}=(A^{T}+A)\cdot \overrightarrow{x} ∂x∂(xT⋅A⋅x)=(AT+A)⋅x
- 5. ∂ t r ( A B ) A = B T \frac{\partial tr(AB)}{A}=B^{T} A∂tr(AB)=BT
- 6. ∂ a T X b X = a b T \frac{\partial \boldsymbol{a}^{T}X\boldsymbol{b}}{X}=\boldsymbol{a}\boldsymbol{b}^{T} X∂aTXb=abT
- 其他
- 矩阵求导术
1. ∂ A x → ∂ x → = A T \frac{\partial A \overrightarrow{x}}{\partial \overrightarrow{x}}=A^{T} ∂x∂Ax=AT
可以看到,是向量对向量求导,多用分母布局,求出来的是梯度矩阵
即先按分母行数n拆成行n,再按分子行数m拆成列m
2. ∂ A x → ∂ x → T = A \frac{\partial A \overrightarrow{x}}{\partial \overrightarrow{x}^{T}}=A ∂xT∂Ax=A
3. ∂ ( x → T A ) ∂ x → = A \frac{\partial(\overrightarrow{x}^{T}A)}{\partial \overrightarrow{x}}=A ∂x∂(xTA)=A
4. ∂ ( x → T ⋅ A ⋅ x → ) ∂ x → = ( A T + A ) ⋅ x → \frac{\partial(\overrightarrow{x}^{T}\cdot A \cdot \overrightarrow{x})}{\partial \overrightarrow{x}}=(A^{T}+A)\cdot \overrightarrow{x} ∂x∂(xT⋅A⋅x)=(AT+A)⋅x
5. ∂ t r ( A B ) A = B T \frac{\partial tr(AB)}{A}=B^{T} A∂tr(AB)=BT
6. ∂ a T X b X = a b T \frac{\partial \boldsymbol{a}^{T}X\boldsymbol{b}}{X}=\boldsymbol{a}\boldsymbol{b}^{T} X∂aTXb=abT
令 f = a T X b f=\boldsymbol{a}^{T}X\boldsymbol{b} f=aTXb
-
对于标量结果 f f f(loss,一般就是所有loss的加和,标量)将所有偏导值与对应方向的偏微分进行相乘并加和可以得到 f f f 的全微分。
-
即 d f df df 等于 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f 与 d X dX dX 的内积
-
已知相同尺寸的矩阵 A \bm{A} A、 B \bm{B} B的内积可以表示为 t r ( A T B ) tr(\bm{A^{\mathsf{T}}B}) tr(ATB)
-
根据2、3得 ∂ f ∂ X = t r ( ∂ f ∂ X T d X ) \frac{\partial f}{\partial X} = tr(\frac{\partial f}{\partial X}^{T}dX) ∂X∂f=tr(∂X∂fTdX)
-
f = a T X b f=\boldsymbol{a}^{T}X\boldsymbol{b} f=aTXb两边取微分,得到 d f = a T ( d X ) b df=a^{T}(dX)b df=aT(dX)b
-
两边加上tr标记,我们就可以得到下式 d f = t r ( a T ( d X ) b ) df=tr(a^{T}(dX)b) df=tr(aT(dX)b)
-
又由于 t f ( a b ) = t r ( b a ) tf(ab)=tr(ba) tf(ab)=tr(ba),所以 d f = t r ( b ( a T ( d X ) ) ) df=tr(b(a^{T}(dX))) df=tr(b(aT(dX)))
-
应用结合律去掉多余的括号 d f = t r ( ( a b T ) T d X ) df=tr((ab^{T})^{T}dX) df=tr((abT)TdX)
-
根据4和8,对应出 ∂ f ∂ X = a b T \frac{\partial f}{\partial X}=ab^{T} ∂X∂f=abT
其他
矩阵求导术
上https://zhuanlan.zhihu.com/p/24709748
下https://zhuanlan.zhihu.com/p/24863977