UA MATH567 高维统计IV Lipschitz组合10 随机矩阵的Bernstein不等式

UA MATH567 高维统计IV Lipschitz组合10 随机矩阵的Bernstein不等式

随机矩阵的Bernstein不等式
假设 X 1 , ⋯   , X N X_1,\cdots,X_N X1,,XN是一列独立对称零均值的随机矩阵, sup ⁡ i ∥ X i ∥ ≤ K , a . s . \sup_i \left\| X_i \right\| \le K,a.s. supiXiK,a.s.,则 ∀ t > 0 \forall t > 0 t>0
P ( ∥ ∑ i = 1 N X i ∥ ≥ t ) ≤ 2 n e − t 2 / 2 σ 2 + K t / 3 P(\left\| \sum_{i=1}^N X_i \right\| \ge t) \le 2n e^{-\frac{t^2/2}{\sigma^2+Kt/3}} P(i=1NXit)2neσ2+Kt/3t2/2

其中 σ 2 = ∥ ∑ i = 1 N E X i 2 ∥ \sigma^2 = \left\| \sum_{i=1}^N EX_i^2 \right\| σ2=i=1NEXi2

引理 矩母函数的半正定序上界 假设 X X X是一个 n n n阶实对称阵零均值随机向量, ∥ X ∥ ≤ K , a . s . \left\| X\right\| \le K,a.s. XK,a.s.,则当 ∣ λ ∣ < 3 / K |\lambda|<3/K λ<3/K时,
e g ( λ ) E X 2 ≥ E e λ X e^{g(\lambda)EX^2}\ge Ee^{\lambda X} eg(λ)EX2EeλX

其中
g ( λ ) = λ 2 / 2 1 − ∣ λ ∣ K / 3 g(\lambda) = \frac{\lambda^2/2}{1-|\lambda|K/3} g(λ)=1λK/3λ2/2

证明
主要思路是Taylor展开,对于 ∣ z ∣ < 3 |z|<3 z<3
e z = 1 + z + z 2 2 + ⋯ = 1 + z + z 2 ∑ p = 2 ∞ z p − 2 / p ! e^z = 1+z+\frac{z^2}{2}+\cdots =1+z+z^2 \sum_{p=2}^{\infty}z^{p-2}/p! ez=1+z+2z2+=1+z+z2p=2zp2/p!

可以验证对于 p ≥ 2 p \ge 2 p2
p ! ≥ 2 × 3 p − 2 p! \ge 2 \times 3^{p-2} p!2×3p2

于是
e z ≤ 1 + z + z 2 ∑ p = 2 ∞ z p − 2 / ( 2 × 3 p − 2 ) = 1 + z + z 2 2 1 1 − ∣ z ∣ / 3 e^z \le 1+z+z^2 \sum_{p=2}^{\infty} z^{p-2}/(2 \times 3^{p-2}) = 1+z+\frac{z^2}{2}\frac{1}{1-|z|/3} ez1+z+z2p=2zp2/(2×3p2)=1+z+2z21z/31

接下来做换元, z = λ x z = \lambda x z=λx ∣ x ∣ ≤ K |x| \le K xK ∣ λ ∣ < 3 / K |\lambda|<3/K λ<3/K,则
e λ x ≤ 1 + λ x + g ( λ ) x 2 e^{\lambda x} \le 1+\lambda x+g(\lambda )x^2 eλx1+λx+g(λ)x2

我们需要的是矩阵形式,上一讲介绍过矩阵函数,对于指数函数与多项式函数,我们可以直接用矩阵记号,于是

扫描二维码关注公众号,回复: 12280505 查看本文章

I + λ X + g ( λ ) X 2 ≽ e λ X I+\lambda X+g(\lambda)X^2 \succcurlyeq e^{\lambda X} I+λX+g(λ)X2eλX

根据积分的保序性,
I + g ( λ ) E X 2 ≥ E e λ X I+g(\lambda)EX^2 \ge Ee^{\lambda X} I+g(λ)EX2EeλX

根据Bernoulli不等式
e g ( λ ) E X 2 ≥ I + g ( λ ) E X 2 e^{g(\lambda )EX^2} \ge I+g(\lambda)EX^2 eg(λ)EX2I+g(λ)EX2

于是引理得证。


证明Bernstein不等式
S = ∑ i = 1 N X i , ∥ S ∥ = max ⁡ i ∣ λ i ( S ) ∣ = max ⁡ ( ∣ λ 1 ( S ) ∣ , ∣ λ i ( S ) ∣ ) S=\sum_{i=1}^N X_i ,\left\| S \right\|=\max_i | \lambda_i(S)|=\max(|\lambda_1(S)|,|\lambda_i(S)|) S=i=1NXi,S=imaxλi(S)=max(λ1(S),λi(S))

我们用Markov不等式
P ( λ 1 ( S ) ≥ t ) = P ( e η λ 1 ( S ) ≥ e η t ) ≤ e − η t E e η λ 1 ( S ) = e − η t E λ 1 ( e η S ) ≤ E t r ( e η S ) P(\lambda_1(S) \ge t) = P(e^{\eta \lambda_1(S)} \ge e^{\eta t}) \le e^{-\eta t}Ee^{\eta \lambda_1(S)} \\ = e^{-\eta t}E\lambda_1(e^{\eta S}) \le Etr(e^{\eta S}) P(λ1(S)t)=P(eηλ1(S)eηt)eηtEeηλ1(S)=eηtEλ1(eηS)Etr(eηS)

最后一个不等式是因为对任意实对称矩阵 A A A
t r ( A ) = ∑ i = 1 n λ i ( A ) ≥ λ 1 ( A ) tr(A) = \sum_{i=1}^n \lambda_i(A) \ge \lambda_1(A) tr(A)=i=1nλi(A)λ1(A)

F N \mathcal{F}_N FN表示 σ ( X 1 , ⋯   , X N ) \sigma(X_1,\cdots,X_N) σ(X1,,XN),也就是这个随机矩阵序列的自然滤波,则
E t r ( e η S ) = E F N − 1 E X N t r ( e ∑ i = 1 N − 1 η X i + η X N ) Etr(e^{\eta S}) = E_{\mathcal{F}_{N-1}}E_{X_{N}}tr(e^{\sum_{i=1}^{N-1}\eta X_i+\eta X_N}) Etr(eηS)=EFN1EXNtr(ei=1N1ηXi+ηXN)

根据Lieb不等式
E F N − 1 E X N t r ( e ∑ i = 1 N − 1 η X i + η X N ) ≤ E F N − 1 t r ( e ∑ i = 1 N − 1 η X i + log ⁡ E X N e η X N ) E_{\mathcal{F}_{N-1}}E_{X_{N}}tr(e^{\sum_{i=1}^{N-1}\eta X_i+\eta X_N}) \le E_{\mathcal{F}_{N-1}}tr(e^{\sum_{i=1}^{N-1}\eta X_i+\log E_{X_N} e^{\eta X_N}}) EFN1EXNtr(ei=1N1ηXi+ηXN)EFN1tr(ei=1N1ηXi+logEXNeηXN)

这样就得到了关于这个矩阵序列的递归式,通过递归我们有
E t r ( e η S ) ≤ t r ( e ∑ i = 1 N log ⁡ E e η X i ) Etr(e^{\eta S}) \le tr(e^{\sum_{i=1}^N\log Ee^{\eta X_i}}) Etr(eηS)tr(ei=1NlogEeηXi)

根据引理,
E e η X i ≤ e g ( η ) E X i 2 Ee^{\eta X_i} \le e^{g(\eta)EX_i^2} EeηXieg(η)EXi2

于是
t r ( e ∑ i = 1 N log ⁡ E e η X i ) ≤ t r ( e g ( η ) ∑ i = 1 N E X i 2 ) = t r ( e g ( η ) z ) tr(e^{\sum_{i=1}^N\log Ee^{\eta X_i}}) \le tr(e^{g(\eta)\sum_{i=1}^N EX_i^2}) = tr(e^{g(\eta) z}) tr(ei=1NlogEeηXi)tr(eg(η)i=1NEXi2)=tr(eg(η)z)

其中 z = ∑ i = 1 N E X i 2 z=\sum_{i=1}^N EX_i^2 z=i=1NEXi2,这是一个 n × n n \times n n×n的实对称矩阵,
t r ( e g ( η ) z ) ≤ n λ 1 ( e g ( η ) z ) = n e g ( η ) λ 1 ( z ) = n e g ( η ) ∥ z ∥ = n e g ( η ) σ 2 tr(e^{g(\eta)z}) \le n\lambda_1(e^{g(\eta)z}) = ne^{g(\eta)\lambda_1(z)}=ne^{g(\eta)\left\| z \right\|}=ne^{g(\eta)\sigma^2} tr(eg(η)z)nλ1(eg(η)z)=neg(η)λ1(z)=neg(η)z=neg(η)σ2

第一个不等号是因为对任意实对称矩阵 A A A
t r ( A ) = ∑ i = 1 n λ i ( A ) ≤ n λ 1 ( A ) tr(A) = \sum_{i=1}^n \lambda_i(A) \le n\lambda_1(A) tr(A)=i=1nλi(A)nλ1(A)

综上,
P ( λ 1 ( S ) ≥ t ) ≤ n e − η t e g ( η ) σ 2 P(\lambda_1(S) \ge t) \le ne^{-\eta t}e^{g(\eta)\sigma^2} P(λ1(S)t)neηteg(η)σ2

选择使得这个上界最小的 η \eta η,比如可以选择 λ = t / ( σ 2 + K t / 3 ) \lambda=t/(\sigma^2+Kt/3) λ=t/(σ2+Kt/3),则
P ( λ 1 ( S ) ≥ t ) ≤ n e − t 2 / 2 σ 2 + K t 3 P(\lambda_1(S) \ge t) \le ne^{-\frac{t^2/2}{\sigma^2+\frac{Kt}{3}}} P(λ1(S)t)neσ2+3Ktt2/2

我们可以重复这个过程,分析 λ n ( S ) \lambda_n(S) λn(S),即可得到Bernstein不等式。

猜你喜欢

转载自blog.csdn.net/weixin_44207974/article/details/112211517