Chapter 4 (Further Topics on Random Variables): Conditional Expectation and Variance revisited

本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记

Law of Iterated Expectations

重期望法则

  • In this section, we revisit the conditional expectation of a random variable X X X given another random variable Y Y Y, and view it as a random variable determined by Y Y Y.

  • We introduce a random variable, denoted by E [ X ∣ Y ] E[X|Y] E[XY], that takes the value E [ X ∣ Y = y ] E[X| Y = y] E[XY=y] when Y Y Y takes the value y y y. Since E [ X ∣ Y = y ] E[X| Y = y] E[XY=y] is a function of y y y, E [ X ∣ Y ] E[X| Y ] E[XY] is a function of Y Y Y, and its distribution is determined by the distribution of Y Y Y.

Example 4.16.

  • We are given a biased coin and we are told that because of manufacturing defects, the probability of heads. denoted by Y Y Y, is itself random, with a known distribution over the interval [ 0 , 1 ] [0, 1] [0,1]. We toss the coin a fixed number n n n of times, and we let X X X be the number of heads obtained. Then. for any y ∈ [ 0 , 1 ] y\in [0, 1] y[0,1], we have E [ X ∣ Y = y ] = n y E[X| Y = y] = ny E[XY=y]=ny, so E [ X ∣ Y ] E[X| Y] E[XY] is the random variable n Y nY nY.

Law of Iterated Expectations

  • Since E [ X ∣ Y ] E[X|Y] E[XY] is a random variable, it has an expectation E [ E [ X ∣ Y ] ] E[E[X|Y]] E[E[XY]] of its own, which can be calculated using the expected value rule:
    在这里插入图片描述
  • By the corresponding versions of the total expectation theorem, they are equal to E [ X ] E[X] E[X].
    在这里插入图片描述
  • We finally note an important property: for any function g g g, we have
    E [ X g ( Y ) ∣ Y ] = g ( Y ) E [ X ∣ Y ] E[Xg(Y)|Y]=g(Y)E[X|Y] E[Xg(Y)Y]=g(Y)E[XY]This is because given the value of Y Y Y, g ( Y ) g(Y) g(Y) is a constant and can be pulled outside the expectation;

Example 4.16 (continued).

  • Suppose that Y Y Y, the probability of heads for our coin is uniformly distributed over the interval [ 0 , 1 ] [0, 1] [0,1]. Since E [ X ∣ Y ] = n Y E[X | Y] = nY E[XY]=nY and E [ Y ] = 1 / 2 E[Y] = 1/2 E[Y]=1/2, by the law of iterated expectations, we have
    E [ X ] = E [ E [ X ∣ Y ] ] = E [ n Y ] = n E [ Y ] = n 2 E[X]=E[E[X|Y]]=E[nY]=nE[Y]=\frac{n}{2} E[X]=E[E[XY]]=E[nY]=nE[Y]=2n

Problem 22.
Consider a gambler who at each gamble either wins or loses his bet with probabilities p p p and 1 − p 1 - p 1p, independent of earlier gambles. When p > 1 / 2 p > 1/2 p>1/2, a popular gambling system, known as the Kelly strategy, is to always bet the fraction 2 p − 1 2p - 1 2p1 of the current fortune. Compute the expected fortune after n n n gambles, starting with x x x units and employing the Kelly strategy.

SOLUTION

  • If the gambler’s fortune at the beginning of a round is a a a, then his expected fortune at the end of a round is
    a ( 1 + p ( 2 p − 1 ) − ( 1 − p ) ( 2 p − 1 ) ) = a ( 1 + ( 2 p − 1 ) 2 ) a(1+p(2p-1)-(1-p)(2p-1))=a(1+(2p-1)^2) a(1+p(2p1)(1p)(2p1))=a(1+(2p1)2)
  • Let X k X_k Xk be the fortune after the k k kth round. Using the preceding calculation, we have
    E [ X k + 1 ∣ X k ] = ( 1 + ( 2 p − 1 ) 2 ) X k E[X_{k+1}|X_k] =(1+(2p-1)^2)X_k E[Xk+1Xk]=(1+(2p1)2)XkUsing the law of iterated expectations, we obtain
    E [ X k + 1 ] = ( 1 + ( 2 p − 1 ) 2 ) E [ X k ] E[X_{k+1}] =(1+(2p-1)^2)E[X_k] E[Xk+1]=(1+(2p1)2)E[Xk]and
    E [ X 1 ] = ( 1 + ( 2 p − 1 ) 2 ) x E[X_{1}] =(1+(2p-1)^2)x E[X1]=(1+(2p1)2)xWe conclude that
    E [ X n ] = ( 1 + ( 2 p − 1 ) 2 ) n x E[X_n] =(1+(2p-1)^2)^nx E[Xn]=(1+(2p1)2)nx

Problem 23.
Pat and Nat are dating, and all of their dates are scheduled to start at 9 p.m. Nat always arrives promptly at 9 p.m. Pat is highly disorganized and arrives at a time that is uniformly distributed between 8 p.m. and 10 p.m. Let X X X be the time in hours between 8 p.m. and the time when Pat arrives. If Pat arrives before 9 p.m., their date will last exactly 3 hours. If Pat arrives after 9 p.m., their date will last for a time that is uniformly distributed between 0 0 0 and 3 − X 3 - X 3X hours. The date starts at the time they meet. Nat gets irritated when Pat is late and will end the relationship after the second date on which Pat is late by more than 45 minutes. All dates are independent of any other dates.

  • ( a ) (a) (a) What is the expected number of hours Nat waits for Pat to arrive?
  • ( b ) (b) (b) What is the expected duration of any particular date?
  • ( c ) (c) (c) What is the expected number of dates they will have before breaking up?

SOLUTION

  • ( a ) (a) (a) Let W W W be the number of hours that Nat waits. We have
    E [ W ] = P ( 0 ≤ X ≤ 1 ) E [ W ∣ 0 ≤ X ≤ 1 ] + P ( X > 1 ) E [ W ∣ X > 1 ] = P ( X > 1 ) E [ W ∣ X > 1 ] = 1 2 ⋅ 1 2 = 1 4 \begin{aligned}E[W] &= P(0 \leq X \leq 1)E[W | 0 \leq X \leq 1] + P(X > 1)E[W |X > 1] \\&=P(X > 1)E[W |X > 1]=\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}\end{aligned} E[W]=P(0X1)E[W0X1]+P(X>1)E[WX>1]=P(X>1)E[WX>1]=2121=41
  • ( b ) (b) (b) Let D D D be the duration of a date. We have E [ D ∣ 0 ≤ X ≤ 1 ] = 3 E[D| 0 \leq X \leq 1] = 3 E[D0X1]=3. Furthermore, when X > 1 X > 1 X>1, the conditional expectation of D D D given X X X is ( 3 − X ) / 2 (3 -X)/2 (3X)/2. Hence, using the law of iterated expectations,
    E [ D ∣ X > 1 ] = E [ E [ D ∣ X ] ∣ X > 1 ] = E [ 3 − X 2 ∣ X > 1 ] = 3 2 − E [ X ∣ X > 1 ] 2 = 3 2 − 3 / 2 2 = 3 4 \begin{aligned}E[D|X>1]&=E[E[D|X]|X>1]=E[\frac{3-X}{2}|X>1] \\&=\frac{3}{2}-\frac{E[X|X>1]}{2} \\&= \frac{3}{2}-\frac{3/2}{2} \\&=\frac{3}{4}\end{aligned} E[DX>1]=E[E[DX]X>1]=E[23XX>1]=232E[XX>1]=2323/2=43Therefore,
    E [ D ] = P ( 0 ≤ X ≤ 1 ) E [ D ∣ 0 ≤ X ≤ 1 ] + P ( X > 1 ) E [ D ∣ X > 1 ] = 1 2 ⋅ 3 + 1 2 ⋅ 3 4 = 15 8 \begin{aligned}E[D] &= P(0 \leq X \leq 1)E[D|0 \leq X \leq 1] + P(X > 1)E[D|X > 1] \\&=\frac{1}{2}\cdot3+\frac{1}{2}\cdot\frac{3}{4}=\frac{15}{8}\end{aligned} E[D]=P(0X1)E[D0X1]+P(X>1)E[DX>1]=213+2143=815
  • ( c ) (c) (c) The probability that Pat will be late by more than 45 minutes is 1 / 8 1/8 1/8. The number of dates before breaking up is the sum of two geometrically distributed random variables with parameter 1 / 8 1/8 1/8, and its expected value is 2 ⋅ 8 = 16 2 \cdot8 = 16 28=16.

Problem 24.
A retired professor comes to the office at a time which is uniformly distributed between 9 a.m. and 1 p.m., performs a single task, and leaves when the task is completed. The duration of the task is exponentially distributed with parameter λ ( y ) = 1 / ( 5 − y ) \lambda(y) = 1/(5 - y) λ(y)=1/(5y), where y y y is the length of the time interval between 9 a.m. and the time of his arrival. The professor has a Ph.D. student who on a given day comes to see him at a time that is uniformly distributed between 9 a.m. and 5 p.m. If the student does not find the professor, he leaves and does not return. If he finds the professor, he spends an amount of time that is uniformly distributed between 0 and 1 hour. The professor will spend the same total amount of time on his task regardless of whether he is interrupted by the student. What is the expected amount of time that the professor will spend with the student and what is the expected time at which he will leave his office?

SOLUTION

  • We define the following random variables:
    • X X X = amount of time the professor devotes to his task [exponentially distributed with parameter λ ( y ) = 1 / ( 5 − y ) \lambda(y) = 1/(5 - y) λ(y)=1/(5y)];
    • Y Y Y = length of time between 9 a.m. and his arrival (uniformly distributed between 0 and 4).
    • W W W = length of time between 9 a.m. and arrival of the Ph.D. student (uniformly distributed between 0 and 8).
    • R R R = amount of time the student will spend with the professor, if he finds the professor (uniformly distributed between 0 and 1 hour).
    • T T T = amount of time the professor will spend with the student.
  • Let also F F F be the event that the student finds the professor.
    E [ T ] = P ( F ) E [ T ∣ F ] + P ( F C ) E [ T ∣ F C ] = P ( F ) E [ T ∣ F ] = P ( F ) E [ R ] = 1 2 P ( F ) E[T] = P(F)E[T| F] + P(F^C)E[T | F^C]= P(F)E[T| F]=P(F)E[R]=\frac{1}{2}P(F) E[T]=P(F)E[TF]+P(FC)E[TFC]=P(F)E[TF]=P(F)E[R]=21P(F)
  • So we need to find P ( F ) P(F) P(F).
    P ( F ) = P ( Y ≤ W ≤ X + Y ) = 1 − ( P ( W < Y ) + P ( W > X + Y ) ) \begin{aligned}P(F)&=P(Y\leq W\leq X+Y) \\&=1-(P(W<Y)+P(W>X+Y))\end{aligned} P(F)=P(YWX+Y)=1(P(W<Y)+P(W>X+Y))We have
    P ( W < Y ) = ∫ 0 4 1 4 ∫ 0 y 1 8 d w d y = 1 4 P(W<Y)=\int_0^4\frac{1}{4}\int_0^y\frac{1}{8}dwdy=\frac{1}{4} P(W<Y)=04410y81dwdy=41and
    P ( W > X + Y ) = ∫ 0 4 P ( W > X + Y ∣ Y = y ) f Y ( y ) d y = ∫ 0 4 P ( X < W − Y ∣ Y = y ) f Y ( y ) d y = ∫ 0 4 ∫ y 8 F X ∣ Y ( w − y ) f W ( w ) f Y ( y ) d w d y = ∫ 0 4 1 4 ∫ y 8 1 8 ∫ 0 w − y f X ∣ Y ( x ) d x d w d y = ∫ 0 4 1 4 ∫ y 8 1 8 ∫ 0 w − y 1 5 − y e − x 5 − y d x d w d y = 12 32 + 1 32 ∫ 0 4 ( 5 − y ) e − 8 − y 5 − y d y \begin{aligned}P(W>X+Y)&=\int_0^4P(W>X+Y|Y=y)f_Y(y)dy \\&=\int_0^4P(X<W-Y|Y=y)f_Y(y)dy \\&=\int_0^4\int_y^8F_{X|Y}(w-y)f_W(w)f_Y(y)dwdy \\&=\int_0^4\frac{1}{4}\int_y^8\frac{1}{8}\int_0^{w-y}f_{X|Y}(x)dxdwdy \\&=\int_0^4\frac{1}{4}\int_y^8\frac{1}{8}\int_0^{w-y}\frac{1}{5-y}e^{-\frac{x}{5-y}}dxdwdy \\&=\frac{12}{32}+\frac{1}{32}\int_0^4(5-y)e^{-\frac{8-y}{5-y}}dy \end{aligned} P(W>X+Y)=04P(W>X+YY=y)fY(y)dy=04P(X<WYY=y)fY(y)dy=04y8FXY(wy)fW(w)fY(y)dwdy=0441y8810wyfXY(x)dxdwdy=0441y8810wy5y1e5yxdxdwdy=3212+32104(5y)e5y8ydyIntegrating numerically, we have
    ∫ 0 4 ( 5 − y ) e − 8 − y 5 − y d y = 1.7584 \int_0^4(5-y)e^{-\frac{8-y}{5-y}}dy=1.7584 04(5y)e5y8ydy=1.7584Thus,
    P ( F ) = 1 − ( P ( W < Y ) + P ( W > X + Y ) ) = 1 − 0.68 = 0.32 \begin{aligned}P(F)&=1-(P(W<Y)+P(W>X+Y)) \\&=1-0.68=0.32\end{aligned} P(F)=1(P(W<Y)+P(W>X+Y))=10.68=0.32
  • The expected amount of time the professor will spend with the student is then
    E [ T ] = 1 2 P ( F ) = 0.16 = 9.6   min ⁡ E[T]=\frac{1}{2}P(F)=0.16=9.6\ \min E[T]=21P(F)=0.16=9.6 min
  • Next, we want to find the expected time the professor will leave his office. Let Z Z Z be the length of time measured from 9 a.m. until he leaves his office.
    E [ Z ] = P ( F ) E [ Z ∣ F ] + P ( F C ) E [ Z ∣ F C ] = P ( F ) E [ X + Y + R ] + P ( F C ) E [ X + Y ] E[Z]=P(F)E[Z|F]+P(F^C)E[Z|F^C]=P(F)E[X+Y+R]+P(F^C)E[X+Y] E[Z]=P(F)E[ZF]+P(FC)E[ZFC]=P(F)E[X+Y+R]+P(FC)E[X+Y]Note that E [ Y ] = 2 E[Y ] = 2 E[Y]=2. We have
    E [ X ∣ Y = y ] = 1 λ ( y ) = 5 − y E[X|Y=y]=\frac{1}{\lambda(y)}=5-y E[XY=y]=λ(y)1=5ywhich implies that E [ X ∣ Y ] = 5 − Y E[X|Y]=5-Y E[XY]=5Yand
    E [ X ] = E [ E [ X ∣ Y ] = E [ 5 − Y ] = 5 − E [ Y ] = 3 E[X] = E[E[X | Y ]= E[5 - Y ] = 5 - E[Y ] = 3 E[X]=E[E[XY]=E[5Y]=5E[Y]=3Therefore,
    E [ Z ] = 5.16 E[Z]=5.16 E[Z]=5.16

Problem 28. The Bivariate Normal PDF.
The (zero mean) bivariate normal PDF is of the form
f X , Y ( x , y ) = c e − q ( x , y ) f_{X,Y}(x,y)=ce^{-q(x,y)} fX,Y(x,y)=ceq(x,y)where the exponent term q ( x , y ) q(x, y) q(x,y) is a quadratic function of x x x and y y y,
q ( x , y ) = x 2 σ x 2 − 2 ρ x y σ x σ y + y 2 σ y 2 2 ( 1 − ρ 2 ) q(x, y) =\frac{\frac{x^2}{\sigma_x^2}-2\rho\frac{xy}{\sigma_x\sigma_y}+\frac{y^2}{\sigma_y^2}}{2(1-\rho^2)} q(x,y)=2(1ρ2)σx2x22ρσxσyxy+σy2y2 σ x \sigma_x σx and σ y \sigma_y σy are positive constants, ρ \rho ρ is a constant that satisfies − 1 < p < 1 -1 < p< 1 1<p<1, and c c c is a normalizing constant.

  • ( a ) (a) (a) By completing the square, rewrite q ( x , y ) q(x, y) q(x,y) in the form ( α x − β y ) 2 + γ y 2 (\alpha x - \beta y)^2+\gamma y^2 (αxβy)2+γy2, for some constants α , β \alpha,\beta α,β, and γ \gamma γ.
  • ( b ) (b) (b) Show that X X X and Y Y Y are zero mean normal random variables with variance σ x \sigma_x σx and σ y \sigma_y σy, respectively.
  • ( c ) (c) (c) Find the normalizing constant c c c.
  • ( d ) (d) (d) Show that the conditional PDF of X X X given that Y = y Y = y Y=y is normal, and identify its conditional mean and variance.
  • ( e ) (e) (e) Show that the correlation coefficient of X X X and Y Y Y is equal to ρ \rho ρ.
  • ( f ) (f) (f) Show that X X X and Y Y Y are independent if and only if they are uncorrelated.
  • ( g ) (g) (g) Show that the estimation error E [ X ∣ Y ] − X E[X | Y] - X E[XY]X is normal with mean zero and variance ( 1 − ρ 2 ) σ x 2 (1-\rho^2)\sigma_x^2 (1ρ2)σx2, and is independent from Y Y Y.

SOLUTION

  • ( a ) (a) (a) We can rewrite q ( x , y ) q(x , y) q(x,y) in the form
    q ( x , y ) = q 1 ( x , y ) + q 2 ( y ) q(x,y)=q_1(x,y)+q_2(y) q(x,y)=q1(x,y)+q2(y)where
    q 1 ( x , y ) = 1 2 ( 1 − ρ 2 ) ( x σ x − ρ y σ y ) 2 q 2 ( y ) = y 2 2 σ y 2 q_1(x,y)=\frac{1}{2(1-\rho^2)}(\frac{x}{\sigma_x}-\rho\frac{y}{\sigma_y})^2\\ q_2(y)=\frac{y^2}{2\sigma_y^2} q1(x,y)=2(1ρ2)1(σxxρσyy)2q2(y)=2σy2y2
  • ( b ) (b) (b) We have
    f Y ( y ) = c e − q 2 ( y ) ∫ − ∞ ∞ e − q 1 ( x , y ) d x f_Y(y)=ce^{-q_2(y)}\int_{-\infty}^\infty e^{-q_1(x,y)}dx fY(y)=ceq2(y)eq1(x,y)dxUsing the change of variables
    u = x / σ x − ρ y / σ y 1 − ρ 2 u=\frac{x/\sigma_x-\rho y/\sigma_y}{\sqrt{1-\rho^2}} u=1ρ2 x/σxρy/σywe obtain
    ∫ − ∞ ∞ e − q 1 ( x , y ) d x = σ x 1 − ρ 2 ∫ − ∞ ∞ e − u 2 / 2 d u = σ x 1 − ρ 2 2 π \int_{-\infty}^\infty e^{-q_1(x,y)}dx=\sigma_x\sqrt{1-\rho^2}\int_{-\infty}^\infty e^{-u^2/2}du=\sigma_x\sqrt{1-\rho^2}\sqrt{2\pi} eq1(x,y)dx=σx1ρ2 eu2/2du=σx1ρ2 2π Thus,
    f Y ( y ) = c σ x 1 − ρ 2 2 π e − y 2 / ( 2 σ y 2 ) f_Y(y)=c\sigma_x\sqrt{1-\rho^2}\sqrt{2\pi}e^{-y^2/(2\sigma_y^2)} fY(y)=cσx1ρ2 2π ey2/(2σy2)We recognize this as a normal PDF with mean zero and variance σ y \sigma_y σy. The result for the random variable X X X follows by symmetry.
  • ( c ) (c) (c) The normalizing constant for the PDF of Y Y Y must be equal to 1 / ( 2 π σ y ) 1/(\sqrt{2\pi}\sigma_y) 1/(2π σy) . It follows that
    c σ x 1 − ρ 2 2 π = 1 2 π σ y c\sigma_x\sqrt{1-\rho^2}\sqrt{2\pi}=\frac{1}{\sqrt{2\pi}\sigma_y} cσx1ρ2 2π =2π σy1which implies that c = 1 2 π σ x σ y 1 − ρ 2 c=\frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}} c=2πσxσy1ρ2 1
  • ( d ) (d) (d)
    f X ∣ Y ( x ∣ y ) = f X , Y ( x , y ) f Y ( y ) = 1 2 π σ x 1 − ρ 2 exp ⁡ { − ( x − ρ σ x y / σ y ) 2 2 σ x 2 ( 1 − ρ 2 ) } f_{X|Y}(x|y)=\frac{f_{X,Y}(x,y)}{f_Y(y)}=\frac{1}{\sqrt{2\pi}\sigma_x\sqrt{1-\rho^2}}\exp\{-\frac{(x-\rho\sigma_xy/\sigma_y)^2}{2\sigma_x^2(1-\rho^2)}\} fXY(xy)=fY(y)fX,Y(x,y)=2π σx1ρ2 1exp{ 2σx2(1ρ2)(xρσxy/σy)2}For any fixed y y y, we recognize this as a normal PDF with mean ρ σ x y / σ y \rho\sigma_xy/\sigma_y ρσxy/σy, and variance σ x 2 ( 1 − ρ 2 ) \sigma_x^2(1-\rho^2) σx2(1ρ2).
  • ( e ) (e) (e)
    E [ X Y ] = E [ E [ X Y ∣ Y ] ] = E [ Y E [ X ∣ Y ] ] = E [ Y ρ σ x Y / σ y ] = ( ρ σ x / σ y ) E [ Y 2 ] = ρ σ x σ y ∴ ρ ( X , Y ) = c o v ( X , Y ) σ x σ y = ρ \begin{aligned}E[XY]&=E[E[XY|Y]]=E[YE[X|Y]] \\&=E[Y\rho\sigma_xY/\sigma_y] \\&=(\rho\sigma_x/\sigma_y)E[Y^2] \\&=\rho\sigma_x\sigma_y \\\therefore \rho(X,Y)&=\frac{cov(X,Y)}{\sigma_x\sigma_y}=\rho \end{aligned} E[XY]ρ(X,Y)=E[E[XYY]]=E[YE[XY]]=E[YρσxY/σy]=(ρσx/σy)E[Y2]=ρσxσy=σxσycov(X,Y)=ρ
  • ( f ) (f) (f) If X X X and Y Y Y are uncorrelated, then ρ = 0 \rho=0 ρ=0, and the joint PDF satisfies f X , Y ( x , y ) = f X ( x ) f Y ( y ) f_{X,Y}(x,y)=f_X(x)f_Y(y) fX,Y(x,y)=fX(x)fY(y), so that X X X and Y Y Y are independent. Conversely, if X X X and Y Y Y are independent, then they are automatically uncorrelated.
  • ( g ) (g) (g) From part ( d ) (d) (d). we know that conditioned on Y = y , X Y = y, X Y=y,X is normal with mean E [ X ∣ Y = y ] E[X |Y= y] E[XY=y] and variance ( 1 − ρ 2 ) σ x 2 (1-\rho^2)\sigma_x^2 (1ρ2)σx2. Therefore, conditioned on Y = y Y = y Y=y, the estimation error X ~ = E [ X ∣ Y = y ] − X \tilde X = E[X | Y = y ] - X X~=E[XY=y]X is normal with mean zero and variance ( 1 − ρ 2 ) σ x 2 (1-\rho^2)\sigma_x^2 (1ρ2)σx2, i.e …
    f X ~ ∣ Y ( x ~ ∣ y ) = 1 2 π ( 1 − ρ 2 ) σ x 2 exp ⁡ { − x ~ 2 2 ( 1 − ρ 2 ) σ x 2 } f_{\tilde X|Y}(\tilde x|y)=\frac{1}{\sqrt{2\pi(1-\rho^2)\sigma_x^2}}\exp\{-\frac{\tilde x^2}{2(1-\rho^2)\sigma_x^2}\} fX~Y(x~y)=2π(1ρ2)σx2 1exp{ 2(1ρ2)σx2x~2}Since the conditional PDF of X ~ \tilde X X~ does not depend on the value y y y of Y Y Y. it follows that X X X is independent of Y Y Y, and the above conditional PDF is also the unconditional PDF of X X X.

The Conditional Expectation as an Estimator

条件期望作为估计值

  • If we view Y Y Y as an observation that provides information about X X X, it is natural to view the conditional expectation, denoted
    X ^ = E [ X ∣ Y ] \hat X=E[X|Y] X^=E[XY]as an estimator of X X X given Y Y Y. The estimation error
    X ~ = X ^ − X \tilde X=\hat X-X X~=X^Xis a random variable satisfying
    E [ X ~ ∣ Y ] = E [ X ^ − X ∣ Y ] = E [ X ^ ∣ Y ] − E [ X ∣ Y ] = X ^ − X ^ = 0 E[\tilde X|Y]=E[\hat X-X|Y]=E[\hat X|Y]-E[X|Y]=\hat X-\hat X=0 E[X~Y]=E[X^XY]=E[X^Y]E[XY]=X^X^=0Thus, the random variable E [ X ~ ∣ Y ] E[\tilde X | Y] E[X~Y] is identically zero: E [ X ~ ∣ Y = y ] = 0 E[\tilde X|Y=y]=0 E[X~Y=y]=0 for all values of y y y. By using the law of iterated expectations, we also have
    E [ X ~ ] = E [ E [ X ~ ∣ Y ] ] = 0 E[\tilde X]=E[E[\tilde X|Y]]=0 E[X~]=E[E[X~Y]]=0This property is reassuring, as it indicates that the estimation error does not have a systematic upward or downward bias.

  • We will now show that X ^ \hat X X^ is uncorrelated with the estimation error X ~ \tilde X X~. Indeed. using the law of iterated expectations, we have
    E [ X ^ X ~ ] = E [ E [ X ^ X ~ ∣ Y ] ] = E [ X ^ E [ X ~ ∣ Y ] ] = 0 E[\hat X\tilde X]=E[E[\hat X\tilde X|Y]]=E[\hat XE[\tilde X|Y]]=0 E[X^X~]=E[E[X^X~Y]]=E[X^E[X~Y]]=0It follows that
    c o v ( X ^ , X ~ ) = E [ X ^ X ~ ] − E [ X ^ ] E [ X ~ ] = 0 − E [ X ] ⋅ 0 = 0 cov(\hat X,\tilde X)=E[\hat X\tilde X]-E[\hat X]E[\tilde X]=0-E[X]\cdot0=0 cov(X^,X~)=E[X^X~]E[X^]E[X~]=0E[X]0=0and X ^ \hat X X^ and X ~ \tilde X X~ are uncorrelated.
  • An important consequence of the fact c o v ( X ^ , X ~ ) = 0 cov(\hat X,\tilde X)= 0 cov(X^,X~)=0 is that
    v a r ( X ) = v a r ( X ^ − X ~ ) = v a r ( X ^ ) + v a r ( X ~ ) var(X)=var(\hat X-\tilde X)=var(\hat X)+var(\tilde X) var(X)=var(X^X~)=var(X^)+var(X~)

The Conditional Variance

  • We introduce the random variable
    v a r ( X ∣ Y ) = E [ ( X − E [ X ∣ Y ] ) 2 ∣ Y ] = E [ X ~ 2 ∣ Y ] var(X|Y)=E[(X-E[X|Y])^2|Y]=E[\tilde X^2|Y] var(XY)=E[(XE[XY])2Y]=E[X~2Y]
  • Using the fact E [ X ~ ] = 0 E[\tilde X]=0 E[X~]=0 and the law of iterated expectations, we can write the variance of the estimation error as
    v a r ( X ~ ) = E [ X ~ 2 ] = E [ E [ X ~ 2 ∣ Y ] ] = E [ v a r ( X ∣ Y ) ] var(\tilde X)=E[\tilde X^2]=E[E[\tilde X^2|Y]]=E[var(X|Y)] var(X~)=E[X~2]=E[E[X~2Y]]=E[var(XY)]and rewrite the equation v a r ( X ) = v a r ( X ~ ) + v a r ( X ^ ) var(X)=var(\tilde X)+var(\hat X) var(X)=var(X~)+var(X^) as follows.

Law of Total Variance (全方差法则)
v a r ( X ) = E [ v a r ( X ∣ Y ) ] + v a r ( E [ X ∣ Y ] ) var(X)=E[var(X|Y)]+var(E[X|Y]) var(X)=E[var(XY)]+var(E[XY])

  • The law of total variance is helpful in calculating variances of random variables by using conditioning.

  • Let X X X and Y Y Y be independent random variables. Using the law of total variance, we can show that
    v a r ( X Y ) = ( E [ X ] ) 2 v a r ( Y ) + ( E [ Y ] ) 2 v a r ( X ) + v a r ( X ) v a r ( Y ) var(XY)=(E[X])^2var(Y)+(E[Y])^2var(X)+var(X)var(Y) var(XY)=(E[X])2var(Y)+(E[Y])2var(X)+var(X)var(Y)

Example 4.16

  • We consider n n n independent tosses of a biased coin whose probability of heads, Y Y Y, is uniformly distributed over the interval [ 0 , 1 ] [0, 1] [0,1]. With X X X being the number of heads obtained, we have E [ X ∣ Y ] = n Y E[X| Y] = nY E[XY]=nY and v a r ( X ∣ Y ) = n Y ( 1 − Y ) var(X | Y) =nY(1 - Y) var(XY)=nY(1Y). Thus,
    E [ v a r ( X ∣ Y ) ] = E [ n Y ( 1 − Y ) ] = n ( E [ Y ] − E [ Y 2 ] ) = n 6 E[var(X | Y)]=E[nY(1 - Y)]=n(E[Y]-E[Y^2])=\frac{n}{6} E[var(XY)]=E[nY(1Y)]=n(E[Y]E[Y2])=6nFurthermore,
    v a r ( E [ X ∣ Y ] ) = v a r ( n Y ) = n 2 v a r ( Y ) = n 2 12 var(E[X|Y])=var(nY)=n^2var(Y)=\frac{n^2}{12} var(E[XY])=var(nY)=n2var(Y)=12n2
  • Therefore, by the law of total variance, we have
    v a r ( X ) = E [ v a r ( X ∣ Y ) ] + v a r ( E [ X ∣ Y ] ) = n 6 + n 2 12 var(X)=E[var(X|Y)]+var(E[X|Y])=\frac{n}{6}+\frac{n^2}{12} var(X)=E[var(XY)]+var(E[XY])=6n+12n2

Example 4.21. Computing Variances by Conditioning.

  • Consider a continuous random variable X X X with the PDF given in Fig. 4.13.
    在这里插入图片描述
  • We define an auxiliary random variable Y Y Y as follows:
    在这里插入图片描述Here, E [ X ∣ Y ] E[X | Y] E[XY] takes the values 1 / 2 1/2 1/2 and 2 2 2, each with probability 1 / 2 1/2 1/2. Thus, the mean of E [ X ∣ Y ] E[X| Y] E[XY] is 5 / 4 5/4 5/4. It follows that
    v a r ( E [ X ∣ Y ] ) = 1 2 ( 1 2 − 5 4 ) 2 + 1 2 ( 2 − 5 4 ) 2 = 9 16 var(E[X|Y])=\frac{1}{2}(\frac{1}{2}-\frac{5}{4})^2+\frac{1}{2}(2-\frac{5}{4})^2=\frac{9}{16} var(E[XY])=21(2145)2+21(245)2=169Conditioned on Y = 1 Y = 1 Y=1 or Y = 2 Y = 2 Y=2, X X X is uniformly distributed on an interval of length 1 1 1 or 2 2 2, respectively. Therefore,
    v a r ( X ∣ Y = 1 ) = 1 12 ,        v a r ( X ∣ Y = 2 ) = 4 12 var(X|Y=1)=\frac{1}{12},\ \ \ \ \ \ var(X|Y=2)=\frac{4}{12} var(XY=1)=121,      var(XY=2)=124and
    E [ v a r ( X ∣ Y ) ] = 1 2 ⋅ 1 12 + 1 2 ⋅ 4 12 = 5 24 E[var(X|Y)]=\frac{1}{2}\cdot\frac{1}{12}+\frac{1}{2}\cdot\frac{4}{12}=\frac{5}{24} E[var(XY)]=21121+21124=245
  • Putting everything together, we obtain
    v a r ( X ) = E [ v a r ( X ∣ Y ) ] + v a r ( E [ X ∣ Y ] ) = 5 24 + 9 16 = 37 48 var(X)=E[var(X|Y)]+var(E[X|Y])=\frac{5}{24}+\frac{9}{16}=\frac{37}{48} var(X)=E[var(XY)]+var(E[XY])=245+169=4837

猜你喜欢

转载自blog.csdn.net/weixin_42437114/article/details/113834241