矩阵微分常用公式整理

1.矩阵的导数

\qquad 如果矩阵 A ( t ) = [ a i j ( t ) ] m × n \boldsymbol A(t)=[a_{ij}(t)]_{m\times n} 的每一个元素 a i j ( t ) a_{ij}(t) 都是变量 t t 的可微函数,则称矩阵 A ( t ) \boldsymbol A(t) 可微的,其导数定义为:

d A ( t ) d t = [ d a i j ( t ) d t ] m × n = [ d a 11 ( t ) d t d a 12 ( t ) d t d a 1 n ( t ) d t d a 21 ( t ) d t d a 22 ( t ) d t d a 2 n ( t ) d t d a m 1 ( t ) d t d a m 2 ( t ) d t d a m n ( t ) d t ] \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{ij}(t)}{\mathrm{d}t}\right]_{m\times n}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{11}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{12}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{1n}(t)}{\mathrm{d}t} \\ \\ \dfrac{\mathrm{d}a_{21}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{22}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{2n}(t)}{\mathrm{d}t} \\ \\ \vdots & \vdots & \cdots & \vdots \\ \\ \dfrac{\mathrm{d}a_{m1}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{m2}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{mn}(t)}{\mathrm{d}t} \\ \end{matrix}\right]

\qquad

  • m = 1 m=1 时,矩阵 A ( t ) = [ a 1 ( t ) , a 2 ( t ) , , a n ( t ) ] \boldsymbol A(t)=[a_1(t),a_2(t),\cdots,a_n(t)] 为(行)向量值函数

    d A ( t ) d t = [ d a j ( t ) d t ] 1 × n = [ d a 1 ( t ) d t d a 2 ( t ) d t d a n ( t ) d t ] 1 × n \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{j}(t)}{\mathrm{d}t}\right]_{1\times n}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{1}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{2}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{n}(t)}{\mathrm{d}t} \\ \end{matrix}\right]_{1\times n}

    \qquad
  • n = 1 n=1 时,矩阵 A ( t ) = [ a 1 ( t ) , a 2 ( t ) , , a m ( t ) ] T \boldsymbol A(t)=[a_1(t),a_2(t),\cdots,a_m(t)]^T 为(列)向量值函数

    d A ( t ) d t = [ d a i ( t ) d t ] m × 1 = [ d a 1 ( t ) d t d a 2 ( t ) d t d a m ( t ) d t ] m × 1 \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{i}(t)}{\mathrm{d}t}\right]_{m\times 1}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{1}(t)}{\mathrm{d}t} \\ \\ \dfrac{\mathrm{d}a_{2}(t)}{\mathrm{d}t} \\ \\ \vdots\\ \\ \dfrac{\mathrm{d}a_{m}(t)}{\mathrm{d}t}\\ \end{matrix}\right]_{m\times 1}

\qquad

2.多元函数对矩阵的导数

\qquad 设矩阵 X = [ x i j ] m × n \bold X=[x_{ij}]_{m\times n} ,考虑该矩阵的 m n mn 元函数 f ( X ) = f ( x 11 , x 12 , , x m 1 , x m 2 , , x m n ) f(\bold X)=f(x_{11},x_{12},\cdots,x_{m1},x_{m2},\cdots,x_{mn}) , 那么 f ( X ) f(\bold X) 对矩阵 X \bold X 的导数定义为:

d f ( X ) d X = [ f x i j ] m × n = [ f x 11 f x 12 f x 1 n f x 21 f x 22 f x 2 n f x m 1 f x m 2 f x m n ] \qquad\qquad \dfrac{\mathrm{d}f(\bold X)}{\mathrm{d}\bold X}=\left[\dfrac{\partial f}{\partial x_{ij}}\right]_{m\times n}=\left[\begin{matrix} \dfrac{\partial f}{\partial x_{11}} & \dfrac{\partial f}{\partial x_{12}} & \cdots & \dfrac{\partial f}{\partial x_{1n}} \\ \\ \dfrac{\partial f}{\partial x_{21}} & \dfrac{\partial f}{\partial x_{22}} & \cdots & \dfrac{\partial f}{\partial x_{2n}} \\ \\ \vdots & \vdots & \cdots & \vdots \\ \\ \dfrac{\partial f}{\partial x_{m1}} & \dfrac{\partial f}{\partial x_{m2}} & \cdots & \dfrac{\partial f}{\partial x_{mn}} \\ \end{matrix}\right]

\qquad

3.多元函数对(列)向量的导数

\qquad n n 维(列)向量 x = [ x 1 , x 2 , , x n ] T \boldsymbol x=[x_1,x_2,\cdots,x_n]^T ,考虑该向量的 n n 元函数 f ( x ) = f ( x 1 , x 2 , , x n ) f(\boldsymbol x)=f(x_{1},x_{2},\cdots,x_{n}) ,那么:

d f ( x ) d x = [ f x 1 , f x 2 , , f x n ] T = [ f x 1 f x 2 f x n ] \qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right]^T=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \dfrac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]
,即: f ( x ) f(\boldsymbol x) 梯度 f ( x ) = d f ( x ) d x \nabla f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}

d f ( x ) d x T = [ f x 1 , f x 2 , , f x n ] \qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right] ,即: f ( x ) f(\boldsymbol x) 梯度的转置 T f ( x ) = d f ( x ) d x T \nabla^T f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}
\qquad

\qquad 因此 f ( x ) = d f ( x ) d x = [ d f ( x ) d x T ] T \qquad\nabla f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}\right]^T
\qquad

常用公式

( 1 ) \qquad(1) 海塞 ( H e s s i a n ) (Hessian) 矩阵:

\qquad   T { f ( x ) } = d d x T ( d f ( x ) d x ) \nabla^T \{\nabla f(\boldsymbol x)\}=\dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x^T}\left(\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}\right)  或  { T f ( x ) } = d d x ( d f ( x ) d x T ) \nabla \{\nabla^T f(\boldsymbol x)\}=\dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x}\left(\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}\right)

\qquad
d d x T ( d f d x ) = [ 2 f x 1 2 2 f x 1 x 2 2 f x 1 x n 2 f x 2 x 1 2 f x 2 2 2 f x 2 x n 2 f x n x 1 2 f x n x 2 2 f x n 2 ] \qquad\qquad\qquad \dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x^T}\left(\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right)=\left[\begin{matrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_1\partial x_n} \\ \\ \dfrac{\partial^2 f}{\partial x_2\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots & \dfrac{\partial^2 f}{\partial x_2\partial x_n} \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \dfrac{\partial^2 f}{\partial x_n\partial x_1} & \dfrac{\partial^2 f}{\partial x_n\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2} \\ \end{matrix}\right]
\qquad

( 2 ) \qquad(2) 二次函数 f ( x ) = x T A x f(\boldsymbol x)=\boldsymbol x^T \boldsymbol A \boldsymbol x 的导数为 d f ( x ) d x = ( A + A T ) x \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=(\boldsymbol A+\boldsymbol A^T )\boldsymbol x

\quad    若 A = [ a i j ] n × n \boldsymbol A=[a_{ij}]_{n\times n} 对称矩阵,那么 d f ( x ) d x = 2 A x \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=2\boldsymbol A \boldsymbol x

\qquad   证明:
f ( x ) = x T A x = i = 1 n j = 1 n a i j x i x j = x 1 j = 1 n a 1 j x j + x 2 j = 1 n a 2 j x j + + x k j = 1 n a k j x j + + x n j = 1 n a n j x j \qquad\qquad\qquad \begin{aligned}f(\boldsymbol x)&=\boldsymbol x^T \boldsymbol A \boldsymbol x=\displaystyle\sum_{i=1}^{n}\displaystyle\sum_{j=1}^{n}a_{ij}x_ix_j \\ &=x_1\displaystyle\sum_{j=1}^{n}a_{1j}x_j +x_2\displaystyle\sum_{j=1}^{n}a_{2j}x_j+\cdots +x_k\displaystyle\sum_{j=1}^{n}a_{kj}x_j+\cdots+x_n\displaystyle\sum_{j=1}^{n}a_{nj}x_j \\ \end{aligned}

f x k = x 1 a 1 k + x 2 a 2 k + + ( j = 1 n a k j x j + x k a k k ) + + x n a n k = ( x 1 a 1 k + x 2 a 2 k + + x k a k k + + x n a n k ) + j = 1 n a k j x j = i = 1 n a i k x i + j = 1 n a k j x j \qquad\qquad\qquad \begin{aligned}\dfrac{\partial f}{\partial x_k}&=x_1a_{1k}+x_2a_{2k}+\cdots+\left(\displaystyle\sum_{j=1}^{n}a_{kj}x_j+x_ka_{kk}\right)+\cdots+x_na_{nk}\\ &=(x_1a_{1k}+x_2a_{2k}+\cdots+x_ka_{kk}+\cdots+x_na_{nk}) +\displaystyle\sum_{j=1}^{n}a_{kj}x_j \\ &=\displaystyle\sum_{i=1}^{n}a_{ik}x_i +\displaystyle\sum_{j=1}^{n}a_{kj}x_j \end{aligned}

d f ( x ) d x = [ f x 1 f x k f x n ] = [ i = 1 n a i 1 x i + j = 1 n a 1 j x j i = 1 n a i k x i + j = 1 n a k j x j i = 1 n a i n x i + j = 1 n a n j x j ] = [ i = 1 n a i 1 x i i = 1 n a i k x i i = 1 n a i n x i ] + [ j = 1 n a 1 j x j j = 1 n a k j x j j = 1 n a n j x j ] = A x + A T x = ( A + A T ) x \qquad\qquad\qquad\begin{aligned} \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}&=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_k}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]=\left[\begin{matrix}\displaystyle\sum_{i=1}^{n}a_{i1}x_i +\displaystyle\sum_{j=1}^{n}a_{1j}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{ik}x_i +\displaystyle\sum_{j=1}^{n}a_{kj}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{in}x_i +\displaystyle\sum_{j=1}^{n}a_{nj}x_j \end{matrix}\right]=\left[\begin{matrix}\displaystyle\sum_{i=1}^{n}a_{i1}x_i \\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{ik}x_i \\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{in}x_i \end{matrix}\right]+\left[\begin{matrix}\displaystyle\sum_{j=1}^{n}a_{1j}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{j=1}^{n}a_{kj}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{j=1}^{n}a_{nj}x_j \end{matrix}\right] \\ &=\boldsymbol A\boldsymbol x+\boldsymbol A^T\boldsymbol x \\ &=(\boldsymbol A +\boldsymbol A^T)\boldsymbol x \\ \end{aligned}
\qquad

( 3 ) \qquad(3) 线性函数 f ( x ) = b T x f(\boldsymbol x)=\boldsymbol b^T \boldsymbol x 的导数为 d f ( x ) d x = b \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\boldsymbol b

\quad    由于 b T x = x T b \boldsymbol b^T \boldsymbol x= \boldsymbol x^T \boldsymbol b ,因此 d f ( b ) d b = x \dfrac{\mathrm{d}f(\boldsymbol b)}{\mathrm{d}\boldsymbol b}=\boldsymbol x

\qquad  证明:  f ( x ) = b T x = i = 1 n b i x i f(\boldsymbol x) =\boldsymbol b^T \boldsymbol x=\displaystyle\sum_{i=1}^{n}b_ix_i

d f ( x ) d x = [ f x 1 f x k f x n ] = [ b 1 b k b n ] = b \qquad\qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_k}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]= \left[\begin{matrix} b_1\\ \\ \vdots\\ \\ b_k\\ \\ \vdots\\ \\ b_n\end{matrix}\right]=\boldsymbol b
\qquad

\qquad

4.一元函数关于向量的复合求导

\qquad 向量值函数 x ( t ) = [ x 1 ( t ) , x 2 ( t ) , , x n ( t ) ] T \boldsymbol x(t)=[x_1(t),x_2(t),\cdots,x_n(t)]^T ,考虑该向量函数的一元函数 f ( x ( t ) ) = f ( x 1 ( t ) , x 2 ( t ) , , x n ( t ) ) f(\boldsymbol x(t))=f(x_1(t),x_2(t),\cdots,x_n(t)) ,那么:

d f d t = [ d f d x ] T d x d t = d f d x T d x d t \qquad\qquad\dfrac{\mathrm{d}f}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right]^T\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}

\qquad 证明:

d f d t = f x 1 d x 1 d t + f x 2 d x 2 d t + + f x n d x n d t = [ f x 1 , f x 2 , , f x n ] [ d x 1 d t d x 2 d t d x n d t ] = [ d f d x ] T d x d t = d f d x T d x d t \qquad\qquad \begin{aligned}\dfrac{\mathrm{d}f}{\mathrm{d}t}&=\dfrac{\partial f}{\partial x_1}\dfrac{\mathrm{d}x_1}{\mathrm{d}t}+\dfrac{\partial f}{\partial x_2}\dfrac{\mathrm{d}x_2}{\mathrm{d}t}+\cdots+\dfrac{\partial f}{\partial x_n}\dfrac{\mathrm{d}x_n}{\mathrm{d}t}\\ &=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right] \left[\begin{matrix}\dfrac{\mathrm{d} x_1}{\mathrm{d} t}\\ \\ \dfrac{\mathrm{d} x_2}{\mathrm{d} t}\\ \\ \vdots\\ \\ \dfrac{\mathrm{d} x_n}{\mathrm{d} t}\end{matrix}\right]=\left[\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right]^T\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}\\ \end{aligned}
\qquad

5. 泰勒级数

\qquad 首先考虑二维的情况,即 x = [ x 1 , x 2 ] T \boldsymbol x=[x_1,x_2]^T ,那么

f ( x 1 + δ 1 , x 2 + δ 2 ) = f ( x 1 , x 2 ) + f x 1 δ 1 + f x 2 δ 2 + 1 2 ( 2 f x 1 2 δ 1 2 + 2 f x 1 x 2 δ 1 δ 2 + 2 f x 2 2 δ 2 2 ) + o ( δ 2 ) \qquad\qquad\begin{aligned}f(x_1+\delta_1,x_2+\delta_2)&=f(x_1,x_2)+\dfrac{\partial f}{\partial x_1}\delta_1+\dfrac{\partial f}{\partial x_2}\delta_2\\ &\quad+\dfrac{1}{2}\left( \dfrac{\partial^2 f}{\partial x_1^2}\delta_1^2+\dfrac{\partial^2 f}{\partial x_1\partial x_2}\delta_1\delta_2+\dfrac{\partial^2 f}{\partial x_2^2}\delta_2^2 \right) \\ &\quad+o\left(\Vert\boldsymbol\delta\Vert^2\right) \end{aligned}

\qquad 扩展到 n n 维的情况,即 x = [ x 1 , x 2 , , x n ] T \boldsymbol x=[x_1,x_2,\cdots,x_n]^T ,那么

f ( x 1 + δ 1 , x 2 + δ 2 , , x n + δ n ) = f ( x 1 , x 2 , , x n ) + i = 1 n f x i δ i + 1 2 i = 1 n j = 1 n 2 f x i x j δ i δ j + o ( δ 2 ) \qquad\qquad \begin{aligned}f(x_1+\delta_1,x_2+\delta_2,\cdots,x_n+\delta_n)&=f(x_1,x_2,\cdots,x_n)+\displaystyle\sum_{i=1}^n\dfrac{\partial f}{\partial x_i}\delta_i \\ &\quad+\dfrac{1}{2}\displaystyle\sum_{i=1}^n\displaystyle\sum_{j=1}^n\dfrac{\partial^2 f}{\partial x_i\partial x_j}\delta_i\delta_j\\ &\quad+o\left(\Vert\boldsymbol\delta\Vert^2\right) \end{aligned}

\qquad
\qquad 写成矩阵的形式:

f ( x + δ ) = f ( x ) + f ( x ) T δ + 1 2 δ T 2 f ( x ) δ + o ( δ 2 ) \qquad\qquad f(\boldsymbol x+\boldsymbol\delta)=f(\boldsymbol x)+\nabla f(\boldsymbol x)^T\boldsymbol\delta+\dfrac{1}{2}\boldsymbol\delta^T\nabla^2 f(\boldsymbol x)\boldsymbol\delta+o\left(\Vert\boldsymbol\delta\Vert^2\right) ,其中 δ = [ δ 1 , δ 2 , , δ n ] T \boldsymbol\delta=[\delta_1,\delta_2,\cdots,\delta_n]^T

\qquad
\qquad 或者,写成向量值函数 f ( x ) f(\boldsymbol x) 在点 x ˉ \bar\boldsymbol x 的展开形式:

f ( x ) = f ( x ˉ ) + f ( x ˉ ) T ( x x ˉ ) + 1 2 ( x x ˉ ) T 2 f ( x ˉ ) ( x x ˉ ) + o ( x x ˉ 2 ) \qquad\qquad f(\boldsymbol x)=f(\bar\boldsymbol x)+\nabla f(\bar\boldsymbol x)^T(\boldsymbol x-\bar\boldsymbol x)+\dfrac{1}{2}(\boldsymbol x-\bar\boldsymbol x)^T\nabla^2 f(\bar\boldsymbol x)(\boldsymbol x-\bar\boldsymbol x)+o\left(\Vert\boldsymbol x-\bar\boldsymbol x\Vert^2\right)

\qquad 【注】此处采用 f ( x ) \nabla f(\boldsymbol x) 表示梯度,采用 2 f ( x ) \nabla^2 f(\boldsymbol x) 表示 h e s s i a n hessian 矩阵(而非PDE中的拉普拉斯算符)。

猜你喜欢

转载自blog.csdn.net/xfijun/article/details/104168293
今日推荐