矩阵内积求导/包含Hadamard root的矩阵求导/matrix elements-wise square root/矩阵逐元素平方根求导/F范数求导

包含Hadamard root的矩阵求导案例比较少,此案例仅供参考:


1 题目

给定 X ∈ R n \mathbf{X} \in \mathbb{R}^{n} XRn A ∈ R n × n \mathbf{A} \in \mathbb{R}^{n \times n} ARn×n f ( X ) = ∑ i = 1 n ∣ A X ∣ i 2 + δ 2 f(\mathbf{X})=\sum_{i=\mathbf{1}}^{n} \sqrt{|\mathbf{A} \mathbf{X}|_{i}^{2}+\delta^{2}} f(X)=i=1nAXi2+δ2 。 其中 ( ⋅ ) \sqrt{(\cdot)} () 表示Hadamard root (elements-wise square root),即矩阵元素逐项平方根。求 f ′ ( X ) f^{\prime}(\mathbf{X}) f(X),即 ∂ f ∂ X \frac{\partial f}{\partial \mathbf{X}} Xf

2 求解

2.1 先用Hadamard product解平方根

v = ∣ A X ∣ 2 + δ 2 1 \mathbf{v}=\sqrt{|\mathbf{A} \mathbf{X}|^{2}+\delta^{2} \mathbf{1}} v=AX2+δ21

∴ v ⊙ v = ∣ A X ∣ 2 + δ 2 1 = A X ⊙ A X + δ 2 1 \begin{aligned} \therefore \quad \mathbf{v} \odot \mathbf{v} &=|\mathbf{A} \mathbf{X}|^{2}+\delta^{2} \mathbf{1} \\ &=\mathbf{A} \mathbf{X} \odot \mathbf{A} \mathbf{X}+\delta^{2} \mathbf{1} \end{aligned} vv=AX2+δ21=AXAX+δ21

根据微分哈达马乘积性质 d ( X ⊙ Y ) = X ⊙ d Y + d X ⊙ Y d(\mathbf{X} \odot \mathbf{Y})=\mathbf{X} \odot d \mathbf{Y}+d \mathbf{X} \odot \mathbf{Y} d(XY)=XdY+dXY 有:
d ( v ⊙ v ) = v ⊙ d v + d v ⊙ v = v ⊙ d v + v ⊙ d v = 2 v ⊙ d v \begin{aligned} d(\mathbf{v} \odot \mathbf{v}) &=\mathbf{v} \odot d \mathbf{v}+d \mathbf{v} \odot \mathbf{v} \\ &=\mathbf{v} \odot d \mathbf{v}+\mathbf{v} \odot d \mathbf{v} \\ &= 2\mathbf{v} \odot d \mathbf{v} \end{aligned} d(vv)=vdv+dvv=vdv+vdv=2vdv

即:
2 v ⊙ d v = d ( A X ⊙ A X + δ 2 1 ) = d ( A X ⊙ A X ) + d ( δ 2 1 ) = 2 A X ⊙ d ( A X ) = 2 A X ⊙ ( ( d A ) X + A d X ) = 2 A X ⊙ A d X \begin{aligned} 2 \mathbf{v} \odot d \mathbf{v} &=d\left(\mathbf{A} \mathbf{X} \odot \mathbf{A} \mathbf{X}+\delta^{2} \mathbf{1}\right) \\ &=d(\mathbf{A} \mathbf{X} \odot \mathbf{A} \mathbf{X})+d(\delta^{2} \mathbf{1}) \\ &=2 \mathbf{A} \mathbf{X} \odot d(\mathbf{A} \mathbf{X}) \\ &=2 \mathbf{A} \mathbf{X} \odot((d \mathbf{A}) \mathbf{X}+\mathbf{A} d \mathbf{X}) \\ &=2 \mathbf{A} \mathbf{X} \odot \mathbf{A} d \mathbf{X} \end{aligned} 2vdv=d(AXAX+δ21)=d(AXAX)+d(δ21)=2AXd(AX)=2AX((dA)X+AdX)=2AXAdX

∴ d v = A X ⊙ A d X ⊘ v \therefore \quad d \mathbf{v}=\mathbf{A} \mathbf{X} \odot \mathbf{A} d \mathbf{X} \oslash \mathbf{v} dv=AXAdXv
其中 ⊘ \oslash 为 Hadamard division / elements-wise division,即矩阵逐项除法,与 ⊙ \odot 具有相似的性质。或令 p ⊙ v = 1 \mathbf{p} \odot \mathbf{v} = \mathbf{1} pv=1,即 p \mathbf{p} p v \mathbf{v} v的Hadamard inverse / elements-wise inverse,此时 d v = A X ⊙ A d X ⊙ p d \mathbf{v}=\mathbf{A} \mathbf{X} \odot \mathbf{A} d \mathbf{X} \odot \mathbf{p} dv=AXAdXp.

2.2 利用Frobenius inner product(矩阵内积)和迹的性质可得解

Frobenius inner product的定义: A : B = tr ⁡ ( A T B ) \mathbf{A}:\mathbf{B} = \operatorname{tr}(\mathbf{A}^{T}\mathbf{B}) A:B=tr(ATB),可得:
f = 1 : v ∴ d f = d ( 1 : v ) = d 1 : v + 1 : d v ( 性 质 : ∇ ( A : B ) = ∇ A : B + A : ∇ B ) = 1 : d v = 1 : ( A X ⊙ A d X ⊘ v ) = 1 : ( A X ⊘ v ⊙ A d X ) ( 性 质 : X ⊙ Y = Y ⊙ X ) = ( 1 ⊙ ( A X ⊘ v ) ) : ( A d X ) ( 性 质 : C : ( A ⊙ B ) = ( C ⊙ A ) : B ) = ( A X ⊘ v ) : ( A d X ) = A T ( A X ⊘ v ) : d X ( 性 质 : C A : B = A : C T B = C : B A T = tr ⁡ ( ( A T ( A X ⊘ v ) ) T d X ) ( 矩 阵 内 积 定 义 ) 即 ∂ f ∂ X = A T ( A X ⊘ v ) ( 性 质 : d f = tr ⁡ ( ( ∂ f ∂ X ) T d X ) ) \begin{aligned} f &=\mathbf{1}: \mathbf{v} \\ \therefore \quad d f &= d(\mathbf{1}: \mathbf{v}) \\ &= d\mathbf{1}:\mathbf{v} + \mathbf{1}:d\mathbf{v} \quad (性质: \nabla(\mathbf{A}: \mathbf{B})=\nabla \mathbf{A}: \mathbf{B}+\mathbf{A}: \nabla \mathbf{B}) \\ &=\mathbf{1}: d \mathbf{v} \\ &=\mathbf{1}:(\mathbf{A} \mathbf{X} \odot \mathbf{A} d \mathbf{X} \oslash \mathbf{v}) \\ &=\mathbf{1}:(\mathbf{A} \mathbf{X} \oslash \mathbf{v} \odot \mathbf{A} d \mathbf{X} ) \quad (性质: \mathbf{X} \odot \mathbf{Y} = \mathbf{Y} \odot \mathbf{X}) \\ &=(\mathbf{1}\odot(\mathbf{A} \mathbf{X} \oslash \mathbf{v})) : (\mathbf{A} d \mathbf{X} ) \quad (性质: \mathbf{C}:(\mathbf{A} \odot \mathbf{B}) = (\mathbf{C} \odot \mathbf{A}):\mathbf{B})\\ &=(\mathbf{A} \mathbf{X} \oslash \mathbf{v}):(\mathbf{A} d \mathbf{X}) \\ &=\mathbf{A}^{T}(\mathbf{A} \mathbf{X} \oslash \mathbf{v}): d \mathbf{X} \quad (性质: \mathbf{C} \mathbf{A} : \mathbf{B} = \mathbf{A} : \mathbf{C}^T\mathbf{B} = \mathbf{C} : \mathbf{B} \mathbf{A}^T \\ &=\operatorname{tr}((\mathbf{A}^T (\mathbf{A} \mathbf{X} \oslash \mathbf{v} ))^T d \mathbf{X}) \quad (矩阵内积定义)\\ 即 \quad \frac{\partial f}{\partial \mathbf{X}} &=\mathbf{A}^{T}(\mathbf{A} \mathbf{X} \oslash \mathbf{v}) \quad (性质: d f=\operatorname{tr}\left(\left(\frac{\partial f}{\partial \mathbf{X}}\right)^{T} d \mathbf{X}\right)) \end{aligned} fdfXf=1:v=d(1:v)=d1:v+1:dv(:(A:B)=A:B+A:B)=1:dv=1:(AXAdXv)=1:(AXvAdX)(:XY=YX)=(1(AXv)):(AdX)(:C:(AB)=(CA):B)=(AXv):(AdX)=AT(AXv):dX(:CA:B=A:CTB=C:BAT=tr((AT(AXv))TdX)()=AT(AXv)(df=tr((Xf)TdX))
如果是使用上面定义的Hadamard inverse,那么结果也可以表示为:
∂ f ∂ X = A T ( A X ⊙ p ) \frac{\partial f}{\partial \mathbf{X}} =\mathbf{A}^{T}(\mathbf{A} \mathbf{X} \odot \mathbf{p}) Xf=AT(AXp)

总结:
先利用Hadamard product解平方根,然后利用相关性质得到 d v d\mathbf{v} dv,最后利用矩阵内积和迹的性质可得解。

参考链接:

猜你喜欢

转载自blog.csdn.net/lyh458/article/details/121868963