本文为 $I n t r o d u c t i o n$ $t o$ $P r o b a b i l i t y$ 的读书笔记

Law of Iterated Expectations

重期望法则

In this section, we revisit the conditional expectation of a random variable $X$ given another random variable $Y$ , and view it as a random variable determined by $Y$ .

We introduce a random variable, denoted by $E [X ∣ Y]$ , that takes the value $E [X ∣ Y = y]$ when $Y$ takes the value $y$ . Since $E [X ∣ Y = y]$ is a function of $y$ , $E [X ∣ Y]$ is a function of $Y$ , and its distribution is determined by the distribution of $Y$ .

Example 4.16.

We are given a biased coin and we are told that because of manufacturing defects, the probability of heads. denoted by $Y$ , is itself random, with a known distribution over the interval $[0, 1]$ . We toss the coin a fixed number $n$ of times, and we let $X$ be the number of heads obtained. Then. for any $y\in [0, 1]$ , we have $E [X ∣ Y = y] = n y$ , so $E [X ∣ Y]$ is the random variable $n Y$ .

Law of Iterated Expectations

Since $E [X ∣ Y]$ is a random variable, it has an expectation $E [E [X ∣ Y]]$ of its own, which can be calculated using the expected value rule:
By the corresponding versions of the total expectation theorem, they are equal to $E [X]$ .
We finally note an important property: for any function $g$ , we have
$E [X g (Y) ∣ Y] = g (Y) E [X ∣ Y]$ This is because given the value of $Y$ , $g (Y)$ is a constant and can be pulled outside the expectation;

Example 4.16 (continued).

Suppose that $Y$ , the probability of heads for our coin is uniformly distributed over the interval $[0, 1]$ . Since $E [X ∣ Y] = n Y$ and $E [Y] = 1 / 2$ , by the law of iterated expectations, we have
$E[X]=E[E[X|Y]]=E[nY]=nE[Y]=\frac{n}{2}$

Problem 22.
Consider a gambler who at each gamble either wins or loses his bet with probabilities $p$ and $1 - p$ , independent of earlier gambles. When $p > 1 / 2$ , a popular gambling system, known as the Kelly strategy, is to always bet the fraction $2 p - 1$ of the current fortune. Compute the expected fortune after $n$ gambles, starting with $x$ units and employing the Kelly strategy.

SOLUTION

If the gambler’s fortune at the beginning of a round is $a$ , then his expected fortune at the end of a round is
$a(1+p(2p-1)-(1-p)(2p-1))=a(1+(2p-1)^2)$
Let $X_k$ be the fortune after the $k$ th round. Using the preceding calculation, we have
$E[X_{k+1}|X_k] =(1+(2p-1)^2)X_k$ Using the law of iterated expectations, we obtain
$E[X_{k+1}] =(1+(2p-1)^2)E[X_k]$ and
$E[X_{1}] =(1+(2p-1)^2)x$ We conclude that
$E[X_n] =(1+(2p-1)^2)^nx$

Problem 23.
Pat and Nat are dating, and all of their dates are scheduled to start at 9 p.m. Nat always arrives promptly at 9 p.m. Pat is highly disorganized and arrives at a time that is uniformly distributed between 8 p.m. and 10 p.m. Let $X$ be the time in hours between 8 p.m. and the time when Pat arrives. If Pat arrives before 9 p.m., their date will last exactly 3 hours. If Pat arrives after 9 p.m., their date will last for a time that is uniformly distributed between $0$ and $3 - X$ hours. The date starts at the time they meet. Nat gets irritated when Pat is late and will end the relationship after the second date on which Pat is late by more than 45 minutes. All dates are independent of any other dates.

$(a)$ What is the expected number of hours Nat waits for Pat to arrive?
$(b)$ What is the expected duration of any particular date?
$(c)$ What is the expected number of dates they will have before breaking up?

SOLUTION

$(a)$ Let $W$ be the number of hours that Nat waits. We have
$\begin{aligned}E[W] &= P(0 \leq X \leq 1)E[W | 0 \leq X \leq 1] + P(X > 1)E[W |X > 1] \\&=P(X > 1)E[W |X > 1]=\frac{1}{2}\cdot\frac{1}{2}=\frac{1}{4}\end{aligned}$
$(b)$ Let $D$ be the duration of a date. We have $\leq X \leq 1] = 3$ . Furthermore, when $X > 1$ , the conditional expectation of $D$ given $X$ is $(3 - X) / 2$ . Hence, using the law of iterated expectations,
$\begin{aligned}E[D|X>1]&=E[E[D|X]|X>1]=E[\frac{3-X}{2}|X>1] \\&=\frac{3}{2}-\frac{E[X|X>1]}{2} \\&= \frac{3}{2}-\frac{3/2}{2} \\&=\frac{3}{4}\end{aligned}$ Therefore,
$\begin{aligned}E[D] &= P(0 \leq X \leq 1)E[D|0 \leq X \leq 1] + P(X > 1)E[D|X > 1] \\&=\frac{1}{2}\cdot3+\frac{1}{2}\cdot\frac{3}{4}=\frac{15}{8}\end{aligned}$
$(c)$ The probability that Pat will be late by more than 45 minutes is $1 / 8$ . The number of dates before breaking up is the sum of two geometrically distributed random variables with parameter $1 / 8$ , and its expected value is $\cdot8 = 16$ .

Problem 24.
A retired professor comes to the office at a time which is uniformly distributed between 9 a.m. and 1 p.m., performs a single task, and leaves when the task is completed. The duration of the task is exponentially distributed with parameter $\lambda(y) = 1/(5 - y)$ , where $y$ is the length of the time interval between 9 a.m. and the time of his arrival. The professor has a Ph.D. student who on a given day comes to see him at a time that is uniformly distributed between 9 a.m. and 5 p.m. If the student does not find the professor, he leaves and does not return. If he finds the professor, he spends an amount of time that is uniformly distributed between 0 and 1 hour. The professor will spend the same total amount of time on his task regardless of whether he is interrupted by the student. What is the expected amount of time that the professor will spend with the student and what is the expected time at which he will leave his office?

SOLUTION

We define the following random variables:
- $X$ = amount of time the professor devotes to his task [exponentially distributed with parameter $\lambda(y) = 1/(5 - y)$ ];
- $Y$ = length of time between 9 a.m. and his arrival (uniformly distributed between 0 and 4).
- $W$ = length of time between 9 a.m. and arrival of the Ph.D. student (uniformly distributed between 0 and 8).
- $R$ = amount of time the student will spend with the professor, if he finds the professor (uniformly distributed between 0 and 1 hour).
- $T$ = amount of time the professor will spend with the student.
Let also $F$ be the event that the student finds the professor.
$P(F^C)E[T | F^C]= P(F)E[T| F]=P(F)E[R]=\frac{1}{2}P(F)$
So we need to find $P (F)$ .
$\begin{aligned}P(F)&=P(Y\leq W\leq X+Y) \\&=1-(P(W<Y)+P(W>X+Y))\end{aligned}$ We have
$P(W<Y)=\int_0^4\frac{1}{4}\int_0^y\frac{1}{8}dwdy=\frac{1}{4}$ and
$\begin{aligned}P(W>X+Y)&=\int_0^4P(W>X+Y|Y=y)f_Y(y)dy \\&=\int_0^4P(X<W-Y|Y=y)f_Y(y)dy \\&=\int_0^4\int_y^8F_{X|Y}(w-y)f_W(w)f_Y(y)dwdy \\&=\int_0^4\frac{1}{4}\int_y^8\frac{1}{8}\int_0^{w-y}f_{X|Y}(x)dxdwdy \\&=\int_0^4\frac{1}{4}\int_y^8\frac{1}{8}\int_0^{w-y}\frac{1}{5-y}e^{-\frac{x}{5-y}}dxdwdy \\&=\frac{12}{32}+\frac{1}{32}\int_0^4(5-y)e^{-\frac{8-y}{5-y}}dy \end{aligned}$ Integrating numerically, we have
$\int_0^4(5-y)e^{-\frac{8-y}{5-y}}dy=1.7584$ Thus,
$\begin{aligned}P(F)&=1-(P(W<Y)+P(W>X+Y)) \\&=1-0.68=0.32\end{aligned}$
The expected amount of time the professor will spend with the student is then
$E[T]=\frac{1}{2}P(F)=0.16=9.6\ \min$
Next, we want to find the expected time the professor will leave his office. Let $Z$ be the length of time measured from 9 a.m. until he leaves his office.
$E[Z]=P(F)E[Z|F]+P(F^C)E[Z|F^C]=P(F)E[X+Y+R]+P(F^C)E[X+Y]$ Note that $E [Y] = 2$ . We have
$E[X|Y=y]=\frac{1}{\lambda(y)}=5-y$ which implies that $E [X ∣ Y] = 5 - Y$ and
$E [X] = E [E [X ∣ Y] = E [5 - Y] = 5 - E [Y] = 3$ Therefore,
$E [Z] = 5.16$

Problem 28. The Bivariate Normal PDF.
The (zero mean) bivariate normal PDF is of the form
$f_{X,Y}(x,y)=ce^{-q(x,y)}$ where the exponent term $q (x, y)$ is a quadratic function of $x$ and $y$ ,
$=\frac{\frac{x^2}{\sigma_x^2}-2\rho\frac{xy}{\sigma_x\sigma_y}+\frac{y^2}{\sigma_y^2}}{2(1-\rho^2)}$ $\sigma_x$ and $\sigma_y$ are positive constants, $\rho$ is a constant that satisfies $- 1 < p < 1$ , and $c$ is a normalizing constant.

$(a)$ By completing the square, rewrite $q (x, y)$ in the form $(\alpha x - \beta y)^2+\gamma y^2$ , for some constants $\alpha,\beta$ , and $\gamma$ .
$(b)$ Show that $X$ and $Y$ are zero mean normal random variables with variance $\sigma_x$ and $\sigma_y$ , respectively.
$(c)$ Find the normalizing constant $c$ .
$(d)$ Show that the conditional PDF of $X$ given that $Y = y$ is normal, and identify its conditional mean and variance.
$(e)$ Show that the correlation coefficient of $X$ and $Y$ is equal to $\rho$ .
$(f)$ Show that $X$ and $Y$ are independent if and only if they are uncorrelated.
$(g)$ Show that the estimation error $E [X ∣ Y] - X$ is normal with mean zero and variance $(1-\rho^2)\sigma_x^2$ , and is independent from $Y$ .

SOLUTION

$(a)$ We can rewrite $q (x, y)$ in the form
$q(x,y)=q_1(x,y)+q_2(y)$ where
$q_1(x,y)=\frac{1}{2(1-\rho^2)}(\frac{x}{\sigma_x}-\rho\frac{y}{\sigma_y})^2\\ q_2(y)=\frac{y^2}{2\sigma_y^2}$
$(b)$ We have
$f_Y(y)=ce^{-q_2(y)}\int_{-\infty}^\infty e^{-q_1(x,y)}dx$ Using the change of variables
$u=\frac{x/\sigma_x-\rho y/\sigma_y}{\sqrt{1-\rho^2}}$ we obtain
$\int_{-\infty}^\infty e^{-q_1(x,y)}dx=\sigma_x\sqrt{1-\rho^2}\int_{-\infty}^\infty e^{-u^2/2}du=\sigma_x\sqrt{1-\rho^2}\sqrt{2\pi}$ Thus,
$f_Y(y)=c\sigma_x\sqrt{1-\rho^2}\sqrt{2\pi}e^{-y^2/(2\sigma_y^2)}$ We recognize this as a normal PDF with mean zero and variance $\sigma_y$ . The result for the random variable $X$ follows by symmetry.
$(c)$ The normalizing constant for the PDF of $Y$ must be equal to $1/(\sqrt{2\pi}\sigma_y)$ . It follows that
$c\sigma_x\sqrt{1-\rho^2}\sqrt{2\pi}=\frac{1}{\sqrt{2\pi}\sigma_y}$ which implies that $c=\frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}}$
$(d)$
$f_{X|Y}(x|y)=\frac{f_{X,Y}(x,y)}{f_Y(y)}=\frac{1}{\sqrt{2\pi}\sigma_x\sqrt{1-\rho^2}}\exp\{-\frac{(x-\rho\sigma_xy/\sigma_y)^2}{2\sigma_x^2(1-\rho^2)}\}$ For any fixed $y$ , we recognize this as a normal PDF with mean $\rho\sigma_xy/\sigma_y$ , and variance $\sigma_x^2(1-\rho^2)$ .
$(e)$
$\begin{aligned}E[XY]&=E[E[XY|Y]]=E[YE[X|Y]] \\&=E[Y\rho\sigma_xY/\sigma_y] \\&=(\rho\sigma_x/\sigma_y)E[Y^2] \\&=\rho\sigma_x\sigma_y \\\therefore \rho(X,Y)&=\frac{cov(X,Y)}{\sigma_x\sigma_y}=\rho \end{aligned}$
$(f)$ If $X$ and $Y$ are uncorrelated, then $\rho=0$ , and the joint PDF satisfies $f_{X,Y}(x,y)=f_X(x)f_Y(y)$ , so that $X$ and $Y$ are independent. Conversely, if $X$ and $Y$ are independent, then they are automatically uncorrelated.
$(g)$ From part $(d)$ . we know that conditioned on $Y = y, X$ is normal with mean $E [X ∣ Y = y]$ and variance $(1-\rho^2)\sigma_x^2$ . Therefore, conditioned on $Y = y$ , the estimation error $\tilde X = E[X | Y = y ] - X$ is normal with mean zero and variance $(1-\rho^2)\sigma_x^2$ , i.e …
$f_{\tilde X|Y}(\tilde x|y)=\frac{1}{\sqrt{2\pi(1-\rho^2)\sigma_x^2}}\exp\{-\frac{\tilde x^2}{2(1-\rho^2)\sigma_x^2}\}$ Since the conditional PDF of $\tilde X$ does not depend on the value $y$ of $Y$ . it follows that $X$ is independent of $Y$ , and the above conditional PDF is also the unconditional PDF of $X$ .

The Conditional Expectation as an Estimator

条件期望作为估计值

If we view $Y$ as an observation that provides information about $X$ , it is natural to view the conditional expectation, denoted
$\hat X=E[X|Y]$ as an estimator of $X$ given $Y$ . The estimation error
$\tilde X=\hat X-X$ is a random variable satisfying
$E[\tilde X|Y]=E[\hat X-X|Y]=E[\hat X|Y]-E[X|Y]=\hat X-\hat X=0$ Thus, the random variable $E[\tilde X | Y]$ is identically zero: $E[\tilde X|Y=y]=0$ for all values of $y$ . By using the law of iterated expectations, we also have
$E[\tilde X]=E[E[\tilde X|Y]]=0$ This property is reassuring, as it indicates that the estimation error does not have a systematic upward or downward bias.

We will now show that $\hat X$ is uncorrelated with the estimation error $\tilde X$ . Indeed. using the law of iterated expectations, we have
$E[\hat X\tilde X]=E[E[\hat X\tilde X|Y]]=E[\hat XE[\tilde X|Y]]=0$ It follows that
$cov(\hat X,\tilde X)=E[\hat X\tilde X]-E[\hat X]E[\tilde X]=0-E[X]\cdot0=0$ and $\hat X$ and $\tilde X$ are uncorrelated.
An important consequence of the fact $cov(\hat X,\tilde X)= 0$ is that
$var(X)=var(\hat X-\tilde X)=var(\hat X)+var(\tilde X)$

The Conditional Variance

We introduce the random variable
$var(X|Y)=E[(X-E[X|Y])^2|Y]=E[\tilde X^2|Y]$
Using the fact $E[\tilde X]=0$ and the law of iterated expectations, we can write the variance of the estimation error as
$var(\tilde X)=E[\tilde X^2]=E[E[\tilde X^2|Y]]=E[var(X|Y)]$ and rewrite the equation $var(X)=var(\tilde X)+var(\hat X)$ as follows.

Law of Total Variance (全方差法则)
$v a r (X) = E [v a r (X ∣ Y)] + v a r (E [X ∣ Y])$

The law of total variance is helpful in calculating variances of random variables by using conditioning.

Let $X$ and $Y$ be independent random variables. Using the law of total variance, we can show that
$var(XY)=(E[X])^2var(Y)+(E[Y])^2var(X)+var(X)var(Y)$

Example 4.16

We consider $n$ independent tosses of a biased coin whose probability of heads, $Y$ , is uniformly distributed over the interval $[0, 1]$ . With $X$ being the number of heads obtained, we have $E [X ∣ Y] = n Y$ and $v a r (X ∣ Y) = n Y (1 - Y)$ . Thus,
$Y)]=n(E[Y]-E[Y^2])=\frac{n}{6}$ Furthermore,
$var(E[X|Y])=var(nY)=n^2var(Y)=\frac{n^2}{12}$
Therefore, by the law of total variance, we have
$var(X)=E[var(X|Y)]+var(E[X|Y])=\frac{n}{6}+\frac{n^2}{12}$

Example 4.21. Computing Variances by Conditioning.

Consider a continuous random variable $X$ with the PDF given in Fig. 4.13.
We define an auxiliary random variable $Y$ as follows:
Here, $E [X ∣ Y]$ takes the values $1 / 2$ and $2$ , each with probability $1 / 2$ . Thus, the mean of $E [X ∣ Y]$ is $5 / 4$ . It follows that
$var(E[X|Y])=\frac{1}{2}(\frac{1}{2}-\frac{5}{4})^2+\frac{1}{2}(2-\frac{5}{4})^2=\frac{9}{16}$ Conditioned on $Y = 1$ or $Y = 2$ , $X$ is uniformly distributed on an interval of length $1$ or $2$ , respectively. Therefore,
$var(X|Y=1)=\frac{1}{12},\ \ \ \ \ \ var(X|Y=2)=\frac{4}{12}$ and
$E[var(X|Y)]=\frac{1}{2}\cdot\frac{1}{12}+\frac{1}{2}\cdot\frac{4}{12}=\frac{5}{24}$
Putting everything together, we obtain
$var(X)=E[var(X|Y)]+var(E[X|Y])=\frac{5}{24}+\frac{9}{16}=\frac{37}{48}$

Chapter 4 (Further Topics on Random Variables): Conditional Expectation and Variance revisited

目录

Law of Iterated Expectations

The Conditional Expectation as an Estimator

The Conditional Variance

猜你喜欢