统计推断(一) Hypothesis Test

个人博客地址 Glooow,欢迎光临~~~

1. Binary Bayesian hypothesis testing

1.0 Problem Setting

  • Hypothesis
    • Hypothesis space H = { H 0 , H 1 } \mathcal{H}=\{H_0, H_1\}
    • Bayesian approach: Model the valid hypothesis as an RV H
    • Prior P 0 = p H ( H 0 ) , P 1 = p H ( H 1 ) = 1 P 0 P_0 = p_\mathsf{H}(H_0), P_1=p_\mathsf{H}(H_1)=1-P_0
  • Observation
    • Observation space Y \mathcal{Y}
    • Observation Model p y H ( H 0 ) , p y H ( H 1 ) p_\mathsf{y|H}(\cdot|H_0), p_\mathsf{y|H}(\cdot|H_1)
  • Decision rule f : Y H f:\mathcal{Y\to H}
  • Cost function C : H × H R C: \mathcal{H\times H} \to \mathbb{R}
    • Let C i j = C ( H j , H i ) , c o r r e c t h y p o i s H j C_{ij}=C(H_j,H_i), correct hypo is H_j
    • C C is valid if C j j < C i j C_{jj}<C_{ij}
  • Optimum decision rule H ^ ( ) = arg min f ( ) E [ C ( H , f ( y ) ) ] \hat{H}(\cdot) = \arg\min\limits_{f(\cdot)}\mathbb{E}[C(\mathsf{H},f(\mathsf{y}))]

1.1 Binary Bayesian hypothesis testing

Theorem: The optimal Bayes’ decision takes the form
L ( y ) p y H ( H 1 ) p y H ( H 0 ) H 1 P 0 P 1 C 10 C 00 C 01 C 11 η L(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \frac{P_0}{P_1} \frac{C_{10}-C_{00}}{C_{01}-C_{11}} \triangleq \eta
Proof:
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(f) &=…
Given y y^*

  • if f ( y ) = H 0 f(y^*)=H_0 , E = C 00 p H y ( H 0 y ) + C 01 p H y ( H 1 y ) \mathbb{E}=C_{00}p_{\mathsf{H|y}}(H_0|y^*)+C_{01}p_{\mathsf{H|y}}(H_1|y^*)
  • if f ( y ) = H 1 f(y^*)=H_1 , E = C 10 p H y ( H 0 y ) + C 11 p H y ( H 1 y ) \mathbb{E}=C_{10}p_{\mathsf{H|y}}(H_0|y^*)+C_{11}p_{\mathsf{H|y}}(H_1|y^*)

So
p H y ( H 1 y ) p H y ( H 0 y ) H 1 C 10 C 00 C 01 C 11 \frac{p_\mathsf{H|y}(H_1|y^*)}{p_\mathsf{H|y}(H_0|y^*)} \overset{H_1} \gtreqless \frac{C_{10}-C_{00}}{C_{01}-C_{11}}
备注:证明过程中,注意贝叶斯检验为确定性检验,因此对于某个确定的 y, f ( y ) = H 1 f(y)=H_1 的概率要么为 0 要么为 1。因此对代价函数求期望时,把 H 看作是随机变量,而把 f ( y ) f(y) 看作是确定的值来分类讨论

Special cases

  • Maximum a posteriori (MAP)
    • C 00 = C 11 = 0 , C 01 = C 10 = 1 C_{00}=C_{11}=0,C_{01}=C_{10}=1
    • H ^ ( y ) = = arg max H { H 0 , H 1 } p H y ( H y ) \hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{H|y}(H|y)
  • Maximum likelihood (ML)
    • C 00 = C 11 = 0 , C 01 = C 10 = 1 , P 0 = P 1 = 0.5 C_{00}=C_{11}=0,C_{01}=C_{10}=1, P_0=P_1=0.5
    • H ^ ( y ) = = arg max H { H 0 , H 1 } p y H ( y H ) \hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{y|H}(y|H)

1.2 Likelyhood Ratio Test

Generally, LRT
L ( y ) p y H ( H 1 ) p y H ( H 0 ) H 1 η L(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \eta

  • Bayesian formulation gives a method of calculating η \eta
  • L ( y ) L(y) is a sufficient statistic for the decision problem
  • L ( y ) L(y) 的可逆函数也是充分统计量

充分统计量

1.3 ROC

  • Detection probability P D = P ( H ^ = H 1 H = H 1 ) P_D = P(\hat{H}=H_1 | \mathsf{H}=H_1)
  • False-alarm probability P F = P ( H ^ = H 1 H = H 0 ) P_F = P(\hat{H}=H_1 | \mathsf{H}=H_0)

性质(重要!)

  • LRT 的 ROC 曲线是单调不减的

ROC

2. Non-Bayesian hypo test

  • Non-Bayesian 不需要先验概率或者代价函数

Neyman-Pearson criterion

max H ^ ( ) P D     s . t . P F α \max_{\hat{H}(\cdot)}P_D \ \ \ s.t. P_F\le \alpha

Theorem(Neyman-Pearson Lemma):NP 准则的最优解由 LRT 得到,其中 η \eta 由以下公式得到
P F = P ( L ( y ) η H = H 0 ) = α P_F=P(L(y)\ge\eta | \mathsf{H}=H_0) = \alpha
Proof
proof

物理直观:同一个 P F P_F 时 LRT 的 P D P_D 最大。物理直观来看,LRT 中判决为 H1 的区域中 p ( y H 1 ) p ( y H 0 ) \frac{p(y|H_1)}{p(y|H_0)} 都尽可能大,因此 P F P_F 相同时 P D P_D 可最大化

备注:NP 准则最优解为 LRT,原因是

  • 同一个 P F P_F 时, LRT 的 P D P_D 最大
  • LRT 取不同的 η \eta 时, P F P_F 越大,则 P D P_D 也越大,即 ROC 曲线单调不减

3. Randomized test

3.1 Decision rule

  • Two deterministic decision rules H ^ ( ) , H ^ ( ) \hat{H'}(\cdot),\hat{H''}(\cdot)

  • Randomized decision rule H ^ ( ) \hat{H}(\cdot) by time-sharing
    H ^ ( ) = { H ^ ( ) ,  with probability  p H ^ ( ) ,  with probability  1 p \hat{\mathrm{H}}(\cdot)=\left\{\begin{array}{ll}{\hat{H}^{\prime}(\cdot),} & {\text { with probability } p} \\ {\hat{H}^{\prime \prime}(\cdot),} & {\text { with probability } 1-p}\end{array}\right.

    • Detection prob P D = p P D + ( 1 p ) P D P_D=pP_D'+(1-p)P_D''
    • False-alarm prob P F = p P F + ( 1 P ) P F P_F=pP_F'+(1-P)P_F''
  • A randomized decision rule is fully described by p H ^ y ( H m y ) p_{\mathsf{\hat{H}|y}}(H_m|y) for m=0,1

3.2 Proposition

  1. Bayesian case: cannot achieve a lower Bayes’ risk than the optimum LRT

    Proof: Risk for each y is linear in p H y ( H 0 y ) p_{\mathrm{H} | \mathbf{y}}\left(H_{0} | \mathbf{y}\right) , so the minima is achieved at 0 or 1, which degenerate to deterministic decision
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(\mathb…

  2. Neyman-Pearson case:

    1. continuous-valued: For a given P F P_F constraint, randomized test cannot achieve a larger P D P_D than optimum LRT
    2. discrete-valued: For a given P F P_F constraint, randomized test can achieve a larger P D P_D than optimum LRT. Furthermore, the optimum rand test corresponds to simple time-sharing between the two LRTs nearby

3.3 Efficient frontier

Boundary of region of achievable ( P D , P F ) (P_D,P_F) operation points

  • continuous-valued: ROC of LRT
  • discrete-valued: LRT points and the straight line segments

Facts

  • P D P F P_D \ge P_F
  • efficient frontier is concave function
  • d P D d P F = η \frac{dP_D}{dP_F}=\eta

efficient frontier

4. Minmax hypo testing

prior: unknown, cost fun: known

4.1 Decision rule

  • minmax approach
    H ^ ( ) = arg min f ( ) max p [ 0 , 1 ] φ ( f , p ) \hat H(\cdot)=\arg\min_{f(\cdot)}\max_{p\in[0,1]} \varphi(f,p)

  • optimal decision rule
    H ^ ( ) = H ^ p ( ) p = arg max p [ 0 , 1 ] φ ( H ^ p , p ) \hat H(\cdot)=\hat{H}_{p_*}(\cdot) \\ p_* = \arg\max_{p\in[0,1]} \varphi(\hat H_p, p)

    要想证明上面的最优决策,首先引入 mismatch Bayes decision
    H ^ q ( y ) = { H 1 , L ( y ) 1 q q C 10 C 00 C 01 C 11 H 0 , o t h e r w i s e \hat{\mathrm{H}}_q(y)=\left\{ \begin{array}{ll}{H_1,} & {L(y) \ge \frac{1-q}{q}\frac{C_{10}-C_{00}}{C_{01}-C_{11}}} \\ {H_0,} & {otherwise}\end{array}\right.
    代价函数如下,可得到 φ ( H ^ q , p ) \varphi(\hat H_q,p) 与概率 p p 成线性关系
    φ ( H ^ q , p ) = ( 1 p ) [ C 00 ( 1 P F ( q ) ) + C 10 P F ( q ) ] + p [ C 01 ( 1 P D ( q ) ) + C 11 P D ( q ) ] \varphi(\hat H_q,p)=(1-p)[C_{00}(1-P_F(q))+C_{10}P_F(q)] + p[C_{01}(1-P_D(q))+C_{11}P_D(q)]
    Lemma: Max-min inequality
    max x min y g ( x , y ) min y max x g ( x , y ) \max_x\min_y g(x,y) \le \min_y\max_x g(x,y)
    Theorem:
    min f ( ) max p [ 0 , 1 ] φ ( f , p ) = max p [ 0 , 1 ] min f ( ) φ ( f , p ) \min_{f(\cdot)}\max_{p\in[0,1]}\varphi(f,p)=\max_{p\in[0,1]}\min_{f(\cdot)}\varphi(f,p)
    Proof of Lemma: Let h ( x ) = min y g ( x , y ) h(x)=\min_y g(x,y)
    g ( x ) f ( x , y ) , x y max x g ( x ) max x f ( x , y ) , y max x g ( x ) min y max x f ( x , y ) \begin{aligned} g(x) &\leq f(x, y), \forall x \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \max _{x} f(x, y), \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \min _{y} \max _{x} f(x, y) \end{aligned}
    Proof of Thm: 先取 p 1 , p 2 [ 0 , 1 ] \forall p_1,p_2 \in [0,1] ,可得到
    φ ( H ^ p 1 , p 1 ) = min f φ ( f , p 1 ) max p min f φ ( f , p ) min f max p φ ( f , p ) max p φ ( H ^ p 2 , p ) \varphi(\hat H_{p_1},p_1)=\min_f \varphi(f,p_1) \le \max_p \min_f \varphi(f,p) \le \min_f \max_p \varphi(f, p) \le \max_p \varphi(\hat H_{p_2}, p)
    由于 p 1 , p 2 p_1,p_2 任取时上式都成立,因此可以取 p 1 = p 2 = p = arg max p φ ( H ^ p , p ) p_1=p_2=p_*=\arg\max_p \varphi(\hat H_p, p)

    要想证明定理则只需证明 φ ( H ^ p , p ) = max p φ ( H ^ p , p ) \varphi(\hat H_{p_*},p_*)=\max_p \varphi(\hat H_{p_*}, p)

    由前面可知 φ ( H ^ q , p ) \varphi(\hat H_q,p) p p 成线性关系,因此要证明上式

    • p ( 0 , 1 ) p_* \in (0,1) ,只需 φ ( H ^ q , p ) p for any  p = 0 \left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p}=0 ,等式自然成立
    • p = 1 p_* = 1 ,只需 φ ( H ^ q , p ) p for any  p > 0 \left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p} > 0 ,最优解就是 p = 1 p=1 q = 0 q_*=0 同理

    根据下面的引理,可以得到最优决策就是 Bayes 决策 p = arg max p φ ( H ^ p , p ) p_*=\arg\max_p \varphi(\hat H_p, p) ,其中 p p_* 满足
    0 = φ ( H ^ p , p ) p = ( C 01 C 00 ) ( C 01 C 11 ) P D ( p ) ( C 10 C 00 ) P F ( p ) \begin{aligned} 0 &=\frac{\partial \varphi\left(\hat{H}_{p_{*}}, p\right)}{\partial p} \\ &=\left(C_{01}-C_{00}\right)-\left(C_{01}-C_{11}\right) P_{\mathrm{D}}\left(p_{*}\right)-\left(C_{10}-C_{00}\right) P_{\mathrm{F}}\left(p_{*}\right) \end{aligned}
    Lemma:
    d φ ( H ^ p , p ) d p p = q = φ ( H ^ q , p ) p p = q = φ ( H ^ q , p ) p for any  p \left.\frac{\mathrm{d} \varphi\left(\hat{H}_{p}, p\right)}{\mathrm{d} p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{\text {for any } p}
    bayes risk

其他内容请看:
统计推断(一) Hypothesis Test
统计推断(二) Estimation Problem
统计推断(三) Exponential Family
统计推断(四) Information Geometry
统计推断(五) EM algorithm
统计推断(六) Modeling
统计推断(七) Typical Sequence
统计推断(八) Model Selection
统计推断(九) Graphical models
统计推断(十) Elimination algorithm
统计推断(十一) Sum-product algorithm

发布了42 篇原创文章 · 获赞 34 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/weixin_41024483/article/details/104165225