本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记
目录
The Continuous Bayes’ Rule
f X ∣ Y ( x ∣ y ) = f X ( x ) f Y ∣ X ( y ∣ x ) f Y ( y ) = f X ( x ) f Y ∣ X ( y ∣ x ) ∫ − ∞ ∞ f X ( t ) f Y ∣ X ( y ∣ t ) d t f_{X|Y}(x|y)=\frac{f_X(x)f_{Y|X}(y|x)}{f_Y(y)}=\frac{f_X(x)f_{Y|X}(y|x)}{\int_{-\infty}^\infty f_X(t)f_{Y|X}(y|t)dt} fX∣Y(x∣y)=fY(y)fX(x)fY∣X(y∣x)=∫−∞∞fX(t)fY∣X(y∣t)dtfX(x)fY∣X(y∣x)
Example 3.19.
A light bulb is known to have an exponentially distributed lifetime Y Y Y. On any given day, the parameter λ \lambda λ of the PDF of Y Y Y is actually a random variable, uniformly distributed in the interval [ 1 , 3 / 2 ] [1, 3/2] [1,3/2]. We test a light bulb and record its lifetime. What can we say about the underlying parameter λ \lambda λ?
SOLUTION
- We model the parameter λ \lambda λ in terms of a uniform random variable Λ \Lambda Λ with PDF
f Λ ( λ ) = 2 , f o r 1 ≤ λ ≤ 3 2 f Λ ∣ Y ( λ ∣ y ) = f Λ ( λ ) f Y ∣ Λ ( y ∣ λ ) ∫ − ∞ ∞ f Λ ( t ) f Y ∣ Λ ( y ∣ t ) d t = 2 λ e − λ y ∫ 1 3 / 2 2 t e − t y d t , f o r 1 ≤ λ ≤ 3 2 f_\Lambda(\lambda)=2,\ \ \ \ \ \ \ \ for\ 1\leq\lambda\leq\frac{3}{2} \\f_{\Lambda|Y}(\lambda|y)=\frac{f_\Lambda(\lambda)f_{Y|\Lambda}(y|\lambda)}{\int_{-\infty}^\infty f_\Lambda(t)f_{Y|\Lambda}(y|t)dt}=\frac{2\lambda e^{-\lambda y}}{\int_1^{3/2}2te^{-ty}dt},\ \ \ \ \ for\ 1\leq\lambda\leq\frac{3}{2} fΛ(λ)=2, for 1≤λ≤23fΛ∣Y(λ∣y)=∫−∞∞fΛ(t)fY∣Λ(y∣t)dtfΛ(λ)fY∣Λ(y∣λ)=∫13/22te−tydt2λe−λy, for 1≤λ≤23
Inference about a Discrete Random Variable
- We first consider the case where the unobserved phenomenon is described in terms of an event A A A whose occurrence is unknown. Let P ( A ) P(A) P(A) be the probability of event A A A. Let Y Y Y be a continuous random variable, and assume that the conditional PDFs f Y ∣ A ( Y f_{Y|A}(Y fY∣A(Y) and f Y ∣ A C ( y ) f_{Y|A^C}(y) fY∣AC(y) are known. We are interested in the conditional probability P ( A ∣ Y = y ) P(A | Y = y) P(A∣Y=y) of the event A A A, given the value y y y of Y Y Y.
- Instead of working with the conditioning event { Y = y } \{Y=y\} {
Y=y}, which has zero probability, let us instead condition on the event { y ≤ Y ≤ y + δ } \{ y\leq Y\leq y+\delta\} {
y≤Y≤y+δ}, where δ \delta δ is a small positive number, and then take the limit as δ \delta δ tends to zero. We have, using Bayes’rule, and assuming that f Y ( y ) > 0 f_Y(y ) > 0 fY(y)>0,
P ( A ∣ Y = y ) ≈ P ( A ∣ y ≤ Y ≤ y + δ ) = P ( A ) P ( y ≤ Y ≤ y + δ ∣ A ) P ( y ≤ Y ≤ y + δ ) ≈ P ( A ) f Y ∣ A ( y ) δ f Y ( y ) δ = P ( A ) f Y ∣ A ( y ) f Y ( y ) = P ( A ) f Y ∣ A ( y ) P ( A ) f Y ∣ A ( y ) + P ( A C ) f Y ∣ A C ( y ) \begin{aligned}P(A|Y=y)&\approx P(A|y\leq Y\leq y+\delta) \\&=\frac{P(A)P(y\leq Y\leq y+\delta|A)}{P(y\leq Y\leq y+\delta)} \\&\approx\frac{P(A)f_{Y|A}(y)\delta}{f_Y(y)\delta} \\&=\frac{P(A)f_{Y|A}(y)}{f_Y(y)} \\&=\frac{P(A)f_{Y|A}(y)}{P(A)f_{Y|A}(y)+P(A^C)f_{Y|A^C}(y)} \end{aligned} P(A∣Y=y)≈P(A∣y≤Y≤y+δ)=P(y≤Y≤y+δ)P(A)P(y≤Y≤y+δ∣A)≈fY(y)δP(A)fY∣A(y)δ=fY(y)P(A)fY∣A(y)=P(A)fY∣A(y)+P(AC)fY∣AC(y)P(A)fY∣A(y) - In a variant of this formula, we consider an event A A A of the form { N = n } \{N=n\} {
N=n}, where N N N is a discrete random variable that represents the different discrete possibilities for the unobserved phenomenon of interest. Let p N p_N pN be the PMF of N N N. Let also Y Y Y be a continuous random variable which, for any given value n n n of N N N, is described by a conditional PDF f Y ∣ N ( y ∣ n ) f_{Y|N}(y|n) fY∣N(y∣n). The above formula becomes
P ( N = n ∣ Y = y ) = p N ( n ) f Y ∣ N ( y ∣ n ) f Y ( y ) = p N ( n ) f Y ∣ N ( y ∣ n ) ∑ i p N ( i ) f Y ∣ N ( y ∣ i ) P(N=n|Y=y)=\frac{p_N(n)f_{Y|N}(y|n)}{f_Y(y)}=\frac{p_N(n)f_{Y|N}(y|n)}{\sum_i p_N(i)f_{Y|N}(y|i)} P(N=n∣Y=y)=fY(y)pN(n)fY∣N(y∣n)=∑ipN(i)fY∣N(y∣i)pN(n)fY∣N(y∣n)
Example 3.20. Signal Detection.
A binary signal S S S is transmitted, and we are given that P ( S = 1 ) = p P(S = 1) = p P(S=1)=p and P ( S = − 1 ) = 1 − p P(S = -1) = 1- p P(S=−1)=1−p. The received signal is Y = N + S Y = N + S Y=N+S, where N N N is normal noise, with zero mean and unit variance, independent of S S S. What is the probability that S = 1 S = 1 S=1, as a function of the observed value y y y of Y Y Y?
SOLUTION
- Conditioned on S = s S = s S=s, the random variable Y Y Y has a normal distribution with mean s s s and unit variance. Applying the formulas given above, we obtain
P ( S = 1 ∣ Y = y ) = p S ( 1 ) f Y ∣ S ( y ∣ 1 ) f Y ( y ) = p 2 π e − ( y − 1 ) 2 / 2 p 2 π e − ( y − 1 ) 2 / 2 + 1 − p 2 π e − ( y + 1 ) 2 / 2 = p e y p e y + ( 1 − p ) e − y P(S=1|Y=y)=\frac{p_S(1)f_{Y|S}(y|1)}{f_Y(y)}=\frac{\frac{p}{\sqrt{2\pi}}e^{-(y-1)^2/2}}{\frac{p}{\sqrt{2\pi}}e^{-(y-1)^2/2}+\frac{1-p}{\sqrt{2\pi}}e^{-(y+1)^2/2}}=\frac{pe^y}{pe^y+(1-p)e^{-y}} P(S=1∣Y=y)=fY(y)pS(1)fY∣S(y∣1)=2πpe−(y−1)2/2+2π1−pe−(y+1)2/22πpe−(y−1)2/2=pey+(1−p)e−ypey
Inference Based on Discrete Observations
- Our earlier forrnula expressing P ( A ∣ Y = y ) P (A| Y = y) P(A∣Y=y) in terms of f Y ∣ A ( y ) f_{Y|A}(y) fY∣A(y) can be turned around to yield
f Y ∣ A ( y ) = f Y ( y ) P ( A ∣ Y = y ) P ( A ) = f Y ( y ) P ( A ∣ Y = y ) ∫ − ∞ ∞ f Y ( t ) P ( A ∣ Y = t ) d t \begin{aligned}f_{Y|A}(y)&=\frac{f_Y(y)P(A|Y=y)}{P(A)} \\&=\frac{f_Y(y)P(A|Y=y)}{\int_{-\infty}^\infty f_Y(t)P(A|Y=t)dt} \end{aligned} fY∣A(y)=P(A)fY(y)P(A∣Y=y)=∫−∞∞fY(t)P(A∣Y=t)dtfY(y)P(A∣Y=y)This formula can be used to make an inference about a random variable Y Y Y when an event A A A is observed. - There is a similar formula for the case where the event A A A is of the form { N = n } \{N=n\} { N=n}, where N N N is an observed discrete random variable that depends on Y Y Y in a manner described by a conditional PMF p N ∣ Y ( n ∣ y ) p_{N|Y}(n| y) pN∣Y(n∣y)
Problem 35.
Let X X X and Y Y Y be independent continuous random variables with PDFs f X f_X fX and f Y f_Y fY, respectively, and let Z = X + Y Z = X + Y Z=X+Y.
- (a) Show that f Z ∣ X ( z ∣ x ) = f Y ( z − x ) f_{Z|X}(z|x) = f_Y (z - x) fZ∣X(z∣x)=fY(z−x). [Hint: Write an expression for the conditional CDF of Z Z Z given X X X, and differentiate.]
- (b) Assume that X X X and Y Y Y are exponentially distributed with parameter λ \lambda λ. Find the conditional PDF of X X X, given that Z = z Z = z Z=z.
SOLUTION
- ( a ) (a) (a) We have
P ( Z ≤ z ∣ X = x ) = P ( X + Y ≤ z ∣ X = x ) = P ( x + Y ≤ z ∣ X = x ) = P ( x + Y ≤ z ) = P ( Y ≤ z − x ) \begin{aligned}P(Z\leq z |X = x) &= P(X + Y\leq z |X = x)\\ &= P(x + Y\leq z|X = x)\\ &= P(x + Y\leq z)\\ &= P(Y\leq z - x)\end{aligned} P(Z≤z∣X=x)=P(X+Y≤z∣X=x)=P(x+Y≤z∣X=x)=P(x+Y≤z)=P(Y≤z−x)where the third equality follows from the independence of X X X and Y Y Y. By differentiating both sides with respect to z z z, the result follows. - ( b ) (b) (b) We have, for 0 ≤ x ≤ z 0\leq x\leq z 0≤x≤z,
f X ∣ Z ( x ∣ z ) = f Z ∣ X ( z ∣ x ) f X ( x ) f Z ( z ) = f Y ( z − x ) f X ( x ) f Z ( z ) = λ 2 e − λ z f Z ( z ) f_{X|Z}(x|z)=\frac{f_{Z|X}(z|x)f_X(x)}{f_Z(z)}=\frac{f_Y (z - x)f_X(x)}{f_Z(z)}=\frac{\lambda^2e^{-\lambda z}}{f_Z(z)} fX∣Z(x∣z)=fZ(z)fZ∣X(z∣x)fX(x)=fZ(z)fY(z−x)fX(x)=fZ(z)λ2e−λzSince this is the same for all x x x, it follows that the conditional distribution of X X X is uniform on the interval [ 0 , z ] [0, z] [0,z], with PDF f X ∣ Z ( x ∣ z ) = 1 / z f_{X|Z}( x | z) = 1/z fX∣Z(x∣z)=1/z.