本文为 $I n t r o d u c t i o n$ $t o$ $P r o b a b i l i t y$ 的读书笔记

Conditioning a Random Variable on an Event

The conditional PMF (条件分布列) of a random variable $X$ , conditioned on a particular event $A$ with $P (A) > 0$ , is defined by
$p_{X|A}(x)=P(X=x|A)=\frac{P(\{X=x\}\cap A)}{P(A)}$ Note that the events $\{X=x\}\cap A$ are disjoint for different values of $x$ , their union is $A$ , and, therefore,
$=\sum_xP(\{X=x\}\cap A)$ Combining the above two formulas, we see that
$\sum_xp_{X|A}(x) = 1,$ so $p_{X|A}$ is a legitimate PMF.

Conditioning one Random Variable on Another

Let $X$ and $Y$ be two random variables associated with the same experiment. If we know that the value of $Y$ is some particular $y$ [with $p_Y(y) >0$ ], this provides partial knowledge about the value of $X$ . This knowledge is captured by the conditional PMF $p_{X|Y}$ of $X$ given $Y$ , which is defined by specializing the
$p_{X|Y}(x| y) = P(X = x| Y = y)=\frac{P(X = x, Y = y)}{P(Y=y)}=\frac{p_{X,Y}(x,y)}{p_Y(y)}$
The conditional PMF is often convenient for the calculation of the joint PMF, using a sequential approach and the formula
$p_{X,Y}(x, y) =p_Y(y)p_{X|Y}(x| y)$
The conditional PMF can also be used to calculate the marginal PMFs.
$p_X(x)=\sum_yp_{X,Y}(x,y)=\sum_yp_Y(y)p_{X|Y}(x|y)$ This formula provides a divide-and-conquer method for calculating marginal PMFs. It is in essence identical to the total probability theorem.

Conditional Expectation

条件期望

A conditional PMF is the same as an ordinary expectation, except that it refers to the new universe. and all probabilities and PMFs are replaced by their conditional counterparts. (Conditional variances can also be treated similarly.)

Let $X$ and $Y$ be random variables associated with the same experiment.
- The conditional expectation of $X$ given an event $A$ with $P (A) > 0$ , is defined by
  $E[X|A]=\sum_xxp_{X|A}(x)$ For a function $g (X)$ , we have
  $E[g(X)|A]=\sum_xg(x)p_{X|A}(x)$
- The conditional expectation of $X$ given a value $y$ of $Y$ is defined by
  $E[X|Y=y]=\sum_xxp_{X|Y}(x|y)$
- If $A_1, ... ,A_n$ be disjoint events that form a partition of the sample space, with $P(A_i) > 0$ for all $i$ , then
  $E[X]=\sum_{i=1}^nP(A_i)E[X|A_i]\\E[X]=\sum_yp_Y(y)E[X|Y=y]$ Furthermore, for any event $B$ with $P(A_i \cap B) > 0$ for all $i$ , we have
  $E[X|B]=\sum_{i=1}^nP(A_i|B)E[X|A_i\cap B]$
  - The last three equalities above apply in different situations. but are essentially equivalent. and will be referred to collectively as the total expectation theorem. They can be used to calculate the unconditional expectation $E [X]$ from the conditional PMF or expectation.

Example 2.17. Mean and Variance of the Geometric.
You write a software program over and over, and each time there is probability $p$ that it works correctly, independent of previous attempts. What is the mean and variance of $X$ . the number of tries until the program works correctly?

SOLUTION
$p_X(x)=(1-p)^{x-1}p,\ \ \ \ \ \ \ x=1,2,...$

The mean and variance of $X$ are given by
$E[X]=\sum_{k=1}^\infty k(1-p)^{k-1}p\\var(X)=\sum_{k=1}^\infty (k-E[X])^2(1-p)^{k-1}p$ but evaluating these infinite sums is somewhat tedious.
As an alternative, we will apply the total expectation theorem, with $A_1 = \{ X = 1\} = \{ first\ try\ is\ a\ success\}$ , $A_2 = \{X > 1\} = \{first\ try\ is\ a\ failure\}$ , and end up with a much simpler calculation.
- If the first try is successful, we have $X = 1$ , and
  $E [X ∣ x = 1] = 1$
- If the first try fails ( $X > 1$ ), we have wasted one try, and we are back where we started. So, the expected number of remaining tries is $E [x]$ , and
  $E [X ∣ X > 1] = 1 + E [X]$ Thus,
  $\begin{aligned}E[X]&=P(X=1)E[X|X=1]+P(X>1)E[X|X>1]\\ &=p+(1-p)(1+E[X])\end{aligned}$ from which we obtain
  $E[X]=\frac{1}{p}$
- With similar reasoning, we also have
  $E[X^2|X=1]=1,E[X^2|X>1]=E[(1+X)^2]=1+2E[X]+E[X^2]$ so that
  $E[X^2]=p+(1-p)(1+2E[X]+E[X^2])$ from which we obtain
  $E[X^2]=\frac{1+2(1-p)E[X]}{p}=\frac{2}{p^2}-\frac{1}{p}$
- We conclude that
  $var(x)=E[X^2]-(E[X])^2=\frac{1-p}{p^2}$

Example 2.18. The Two-Envelopes Paradox.
You are handed two envelopes. and you are told that one of them contains $m$ times as much money as the other, where $m$ is an integer with $m > 1$ . You open one of the envelopes and look at the amount inside. You may now keep this amount, or you may switch envelopes and keep the amount in the other envelope. What is the best strategy?

Here is a line of reasoning that argues in favor of switching. Let $A$ be the envelope you open and $B$ be the envelope that you may switch to. Let also $x$ and $y$ be the amounts in $A$ and $B$ , respectively. Then, as the argument goes, either $y = x / m$ or $y = m x$ , with equal probability $1 / 2$ , so given $x$ , the expected value of $y$ is
$\frac{1}{2}\cdot\frac{x}{m}+\frac{1}{2}\cdot mx=\frac{1}{2}(\frac{1}{m}+m)x>x$ Therefore, you should always switch to envelope $B$ ! But then, since you should switch regardless of the amount found in $A$ , you might as well open $B$ to begin with; but once you do, you should switch again, etc.
There are two assumptions, both flawed to some extent, that underlie this paradoxical line of reasoning.
- (a) You have no a priori knowledge about the amounts in the envelopes, so given $x$ , the only thing you know about $y$ is that it is either $1 / m$ or $m$ times $x$ , and there is no reason to assume that one is more likely than the other.
- (b) Given two random variables $X$ and $Y$ , representing monetary amounts, if $E [Y ∣ X = x] > x$ , for all possible values $x$ of $X$ , then the strategy that always switches to $Y$ yields a higher expected monetary gain.
Let us scrutinize these assumptions.
- Assumption (a) is flawed because it relies on an incompletely specified probabilistic model. Indeed, in any correct model, all events, including the possible values of $X$ and $Y$ , must have well-defined probabilities. With such probabilistic knowledge about $X$ and $Y$ , the value of $X$ may reveal a great deal of information about $Y$ . Roughly speaking, if you have an idea about the range and likelihood of the values of $X$ , you can judge whether the amount $X$ found in $A$ is relatively small or relatively large, and accordingly switch or not switch envelopes.
  - For example, assume the following probabilistic model: someone chooses an integer dollar amount $Z$ from a known range $[\underline z,\overline z]$ ，according to some distribution, places this amount in a randomly chosen envelope, and places $m$ times this amount in the other envelope. You then choose to open one of the two envelopes (with equal probability), and look at the enclosed amount $X$ . If $X$ turns out to be larger than the upper range limit $\overline z$ ， you know that $X$ is the larger of the two amounts, and hence you should not switch. Thus, in this model, the choice to switch or not should depend on the value of $X$ .
  - Mathematically, in a correct probabilistic model, we must have a joint PMF for the random variables $X$ and $Y$ , the amounts in envelopes $A$ and $B$ , respectively. This joint PMF is specified by introducing a PMF $p_Z$ for the random variable $Z$ , the minimum of the amounts in the two envelopes. Then, for all $z$ ,
    $p_{X,Y}(mz,z)=p_{X,Y}(z,mz)=\frac{1}{2}p_Z(z)$ and $p_{X,Y}(x,y)=0$ for every $(x, y)$ that is not of the form $(m z, z)$ or $(z, m z)$ . With this specification of $p_{X,Y}(x, y)$ , and given that $X = x$ , one can use the rule
    $switch\ if\ and\ only\ if\ E[Y |X = x] > x.$
- Here is a devilish example: A fair coin is tossed until it comes up heads. Let $N$ be the number of tosses. Then, $m^N$ dollars are placed in one envelope and $m^{N -1}$ dollars are placed in the other. Let $X$ be the amount in the envelope you open (envelope $A$ ), and let $Y$ be the amount in the other envelope (envelope $B$ ). Now, if $A$ contains $1, clearly $B$ contains $m, so you should switch envelopes. If. on the other hand, $A$ contains $m^n$ dollars, where $n > 0$ , then $B$ contains either $m^{n-1}$ or $m^{n+1}$ dollars. Since $N$ has a geometric PMF, we have
  $\frac{P(Y=m^{n+1}|X=m^n)}{P(Y=m^{n-1}|X=m^n)}=\frac{P(Y=m^{n+1},X=m^n)}{P(Y=m^{n-1},X=m^n)}=\frac{P(N=n+1)}{P(N=n)}=\frac{1}{2}$ Thus
  $P(Y=m^{n-1}|X=m^n)=\frac{2}{3},P(Y=m^{n+1}|X=m^n)=\frac{1}{3}$ and
  $E[Y|X=m^n]=\frac{2}{3}\cdot m^{n-1}+\frac{1}{3}\cdot m^{n+1}=\frac{2+m^2}{3m}\cdot m^n$ We have $2 +m^2）/3m > 1$ if and only if $m^2 - 3m + 2 > 0$ or $(m - 1) (m - 2) > 0$ . Thus if $m > 2$ , then
  $E[Y|X=m^n]>m^n$ and to maximize the expected monetary gain you should always switch to $B$ !
  A naive application of the total expectation theorem might seem to indicate that $E [Y] > E [X]$ . However, this cannot be true, since $X$ and $Y$ have identical PMFs. Instead, we have
  $E[X]=E[Y]=\infty$ The conclusion is that the decision rule that switches if and only if $E [Y ∣ X = x] > x$ does not improve the expected monetary gain in the case where $\infty$ , and the apparent paradox is resolved.

Problem 32. D. Bernoulli’s problem of joint lives.
Consider $2 m$ persons forming $m$ couples who live together at a given time. Suppose that at some later time, the probability of each person being alive is $p$ , independent of other persons. At that later time. let $A$ be the number of persons that are alive and let $S$ be the number of couples in which both partners are alive. For any survivor number $a$ , find $E [S ∣ A = a]$ .

SOLUTION

Let $X_i$ be the random variable taking the value 1 or 0 depending on whether the first partner of the $i$ th couple has survived or not. Let $Y_i$ be the corresponding random variable for the second partner of the $i$ th couple. Then, we have $=\sum_{i=1}^m X_iY_i$ , and by using the total expectation theorem,
$\begin{aligned}E[S |A = a] &=\sum_{i=1}^mE[X_iY_i|A = a] \\&= mE[X_1Y_1 |A = a] \\&= mE[Y_1 |X_1 = 1, A = a]P(X_1 = 1|A = a) \\&= mP(Y_1 = 1 |X_1 = 1, A = a)P(X_1 = 1|A = a) \\&=m\frac{a-1}{2m-1}\frac{a}{2m} \\&=\frac{a(a-1)}{2(2m-1)}\end{aligned}$

Problem 33.
A coin that has probability of heads equal to $p$ is tossed successively and independently until a head comes twice in a row or a tail comes twice in a row. Find the expected value of the number of tosses.

SOLUTION

One possibility here is to calculate the PMF of $X$ , the number of tosses until the game is over, and use it to compute $E [X]$ . However, with an unfair coin, this turns out to be cumbersome, so we argue by using the total expectation theorem and a suitable partition of the sample space.
Let $H_k$ (or $T_k$ ) be the event that a head (or a tail, respectively) comes at the $k$ th toss. and let $p$ (respectively, $q$ ) be the probability of $H_k$ (respectively, $T_k$ ） . Since $H_1$ and $T_1$ form a partition of the sample space, and $P(H_1) = p$ and $P(T_1) = q$ , we have
$E[X] = pE[X | H_1] + qE[X|T_1]$
Using again the total expectation theorem, we have
$E[X|H_1]=pE[X|H_1\cap H_2]+qE[X|H_1\cap T_2]=2p+q(1+E[X|T_1])$ Similarly, we obtain
$E[X|T_1]=2q+p(1+E[X|H_1])$ Combining the above two relations. collecting terms, and using the fact $p + q = 1$ , we obtain after some calculation
$E[X|T_1]=\frac{2+p^2}{1-pq},\ \ \ \ \ E[X|H_1]=\frac{2+q^2}{1-pq}$ Thus,
$E[X]=\frac{2+pq}{1-pq}$

Problem 34
A spider and a fly move along a straight line. At each second, the fly moves a unit step to the right or to the left with equal probability $p$ , and stays where it is with probability $1 - 2 p$ . The spider always takes a unit step in the direction of the fly. The spider and the fly start $D$ units apart, where $D$ is a random variable taking positive integer values with a given PMF. If the spider lands on top of the fly, it’s the end. What is the expected value of the time it takes for this to happen?

SOLUTION

Let $T$ be the time at which the spider lands on top of the fly. We define
- $A_d$ : the event that initially the spider and the fly are $d$ units apart.
- $B_d$ : the event that after one second the spider and the fly are $d$ units apart.
Our approach will be to first apply the (conditional version of the) total expectation theorem to compute $E[T | A_i]$ , then use the result to compute $E[T |A_2]$ and similarly compute sequentially $E[T | A_d]$ for all relevant values of $d$ . We will then apply the (unconditional version of the) total expectation theorem to compute $E [T]$ .
We have
$A_d= (A_d\cap B_d)\cup (A_d \cap B_{d-1}) \cup (A_d\cap B_{d-2}), \ \ \ \ if\ d > 1$ This is because if the spider and the fly are at a distance $d > 1$ apart, then one second later their distance will be $d$ (if the fly moves away from the spider) or $d - 1$ (if the fly does not move) or $d - 2$ (if the fly moves towards the spider). We also have, for the case where the spider and the fly start one unit apart,
$A_1 = (A_1\cap B_1)\cup (A_1 \cap B_0)$ Using the total expectation theorem. we obtain
$A_d] = P(B_d| A_d)E[T | A_d\cap B_d]+ P(B_{d-1}| A_d)E[T | A_d\cap B_{d-1}]+P(B_{d-2}| A_d)E[T | A_d\cap B_{d-2}],\ \ \ \ if\ d > 1$ and
$A_1] = P(B_1| A_1)E[T | A_1\cap B_1]+ P(B_0| A_1)E[T | A_1\cap B_0],\ \ \ \ if\ d =1$ It can be seen based on the problem data that
$P(B_1 | A_1) = 2p, P(B_0 | A_1) = 1 - 2p \\E[T|A_1\cap B_1]=1+E[T|A_1],E[T|A_1\cap B_0]=1$ so by applying the formula for the case $d = 1$ , we obtain
$E[T|A_1]=2p(1+E[T|A_1])+(1-2p)\\ \therefore E[T|A_1]=\frac{1}{1-2p}$ By applying the formula with $d = 2$ , we obtain
$\begin{aligned}E[T|A_2]&=pE[T|A_2\cap B_2]+(1-2p)E[T|A_2\cap B_1]+pE[T|A_2\cap B_0]\\ &=p(1+E[T|A_2])+(1-2p)(1+E[T|A_1])+p \\\therefore E[T|A_2]&=\frac{2}{1-p}\end{aligned}$ Generalizing. we obtain for $d > 2$ ,
$E[T|A_d]=p(1+E[T|A_d])+(1-2p)(1+E[T|A_{d-1}])+p(1+E[T|A_{d-2}])$ Thus, $E[T | A_d]$ can be generated recursively for any initial distance $d$ , using as initial conditions the values of $E[T | A_1]$ and $E[T |A_2]$ obtained earlier.
Finally, the expected value of $T$ can be obtained using the given PMF for the initial distance $D$ and the total expectation theorem:
$E[T]=\sum_dp_D(d)E[T|A_d]$

Problem 35
Verify the expected value rule
$E[g(X,Y)]=\sum_x\sum_yg(x,y)p_{X,Y}(x,y)$ using the expected value rule for a function of a single random variable.

SOLUTION
$\begin{aligned}E[g(X,Y)]&=\sum_yp_Y(y)E[g(X,Y)|Y=y] \\&=\sum_yp_Y(y)E[g(X,y)|Y=y] \\&=\sum_yp_Y(y)\sum_xg(x,y)p_{X|Y}(x|y) \\&=\sum_x\sum_yg(x,y)p_{X,Y}(x,y) \end{aligned}$

Problem 37. Splitting a Poisson random variable
A transmitter sends out either a $1$ with probability $p$ . or a $0$ with probability $1 - p$ . independent of earlier transmissions. If the number of transmissions within a given time interval has a Poisson PMF with parameter $\lambda$ , show that the number of $1$ s transmitted in that same time interval has a Poisson PMF with pararneter $p\lambda$

SOLUTION

Let $X$ and $Y$ be the numbers of $1$ s and $0$ s transmitted. respectively. Let $Z = X + Y$ be the total number of symbols transmitted. We have
$\begin{aligned}p_{X,Y}(n,m)&=p_{X,Y|Z}(n,m|n+m)p_Z(n+m) \\&=\begin{pmatrix}n+m\\n\end{pmatrix}p^n(1-p)^{m}\cdot \frac{e^{-\lambda}\lambda^{n+m}}{(n+m)!} \\&=\frac{e^{-\lambda p}(\lambda p)^n}{n!}\frac{e^{-\lambda(1-p)}(\lambda(1-p))^m}{m!}\end{aligned}$ Thus.
$\begin{aligned}p_X(n)&=\sum_{m=0}^\infty p_{X,Y}(n,m) \\&=\frac{e^{-\lambda p}(\lambda p)^n}{n!}e^{-\lambda(1-p)}\sum_{m=0}^\infty \frac{(\lambda(1-p))^m}{m!} \\&=\frac{e^{-\lambda p}(\lambda p)^n}{n!}e^{-\lambda(1-p)}e^{\lambda(1-p)} \\&=\frac{e^{-\lambda p}(\lambda p)^n}{n!}\end{aligned}$

Chapter 2 (Discrete Random Variables): Conditioning (条件)

目录

Conditioning a Random Variable on an Event

Conditioning one Random Variable on Another

Conditional Expectation

猜你喜欢