Chapter 4 (Further Topics on Random Variables): Covariance and Correlation (协方差和相关)

本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记

Covariance

  • The covariance of two random variables X X X and Y Y Y, denoted by c o v ( X , Y ) cov(X, Y) cov(X,Y), is defined by
    c o v ( X , Y ) = E [ ( X − E [ X ] ) ( Y − E [ Y ] ) ] = E [ X Y ] − E [ X ] E [ Y ] \begin{aligned}cov(X,Y)&=E[(X-E[X])(Y-E[Y])] \\&=E[XY]-E[X]E[Y] \end{aligned} cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]
  • When c o v ( X , Y ) = 0 cov(X, Y) = 0 cov(X,Y)=0, we say that X X X and Y Y Y are uncorrelated. Roughly speaking, a positive or negative covariance indicates that the values of X − E [ X ] X-E[X] XE[X] and Y − E [ Y ] Y-E[Y] YE[Y] obtained in a single experiment “tend” to have the same or the opposite sign, respectively. Thus the sign of the covariance provides an important qualitative indicator of the relationship between X X X and Y Y Y.
    在这里插入图片描述

  • We record a few properties of covariances that are easily derived from the definition: for any random variables X X X, Y Y Y, and Z Z Z, and any scalars a a a and b b b, we have
    c o v ( X , X ) = v a r ( X ) c o v ( X , a Y + b ) = a ⋅ c o v ( X , Y ) c o v ( X , Y + Z ) = c o v ( X , Y ) + c o v ( X , Z ) cov(X,X)=var(X) \\cov(X,aY+b)=a\cdot cov(X,Y) \\cov(X,Y+Z)=cov(X,Y)+cov(X,Z) cov(X,X)=var(X)cov(X,aY+b)=acov(X,Y)cov(X,Y+Z)=cov(X,Y)+cov(X,Z)
  • Assume that X X X and Y Y Y satisfy
    E [ X ∣ Y = y ] = E [ X ] ,      f o r   a l l   y E[X|Y=y]=E[X],\ \ \ \ for\ all\ y E[XY=y]=E[X],    for all yThen, assuming X X X and Y Y Y are discrete, the total expectation theorem implies that
    E [ X Y ] = ∑ y p Y ( y ) E [ X Y ∣ Y = y ] = ∑ y y p Y ( y ) E [ X ∣ Y = y ] = ∑ y y p Y ( y ) E [ X ] = E [ X ] E [ Y ] \begin{aligned}E[XY]&=\sum_yp_Y(y)E[XY|Y=y]=\sum_yyp_Y(y)E[X|Y=y]\\ &=\sum_yyp_Y(y)E[X]=E[X]E[Y]\end{aligned} E[XY]=ypY(y)E[XYY=y]=yypY(y)E[XY=y]=yypY(y)E[X]=E[X]E[Y]so X X X and Y Y Y are uncorrelated. The argument for the continuous case is similar.
  • Note thate if X X X and Y Y Y are independent, we have c o v ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = 0 cov(X, Y) = E[XY]-E[X]E[Y]=0 cov(X,Y)=E[XY]E[X]E[Y]=0. Thus, if X X X and Y Y Y are independent, they are also uncorrelated. However, the converse is generally not true, as illustrated by the following example.

Example 4.13.

  • The pair of random variables ( X , Y ) (X, Y) (X,Y) takes the values ( 1 , 0 ) , ( 0 , 1 ) , ( − 1 , 0 ) (1, 0), (0, 1), (-1,0) (1,0),(0,1),(1,0) and ( 0 , − 1 ) (0, -1) (0,1). each with probability 1 / 4 1/4 1/4. Therefore,
    c o v ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = 0 − 0 = 0 cov(X,Y)=E[XY]-E[X]E[Y]=0-0=0 cov(X,Y)=E[XY]E[X]E[Y]=00=0and X X X and Y Y Y are uncorrelated. However, X X X and Y Y Y are not independent since, for example, a nonzero value of X X X fixes the value of Y Y Y to zero.

  • The correlation coefficient (相关系数) ρ ( X , Y ) \rho(X,Y) ρ(X,Y) of two random variables X X X and Y Y Y that have nonzero variances is defined as
    ρ ( X , Y ) = c o v ( X , Y ) v a r ( X ) v a r ( Y ) \rho(X, Y) =\frac{cov(X, Y)}{\sqrt{var(X) var(Y)}} ρ(X,Y)=var(X)var(Y) cov(X,Y)(The simpler notation ρ \rho ρ will also be used when X X X and Y Y Y are clear from the context.)
  • It may be viewed as a normalized version of the covariance c o v ( X , Y ) cov(X, Y) cov(X,Y), and infact, it can be shown that ρ \rho ρ ranges from − 1 -1 1 to 1 1 1.
  • If ρ > 0 \rho>0 ρ>0 (or ρ < 0 \rho < 0 ρ<0), then the values of X − E [ X ] X - E[X] XE[X] and Y − E [ Y ] Y-E[Y] YE[Y] “tend” to have the same (or opposite, respectively) sign. The size of ∣ ρ ∣ |\rho| ρ normalized measure of the extent to which this is true. In fact, always assuming that X X X and Y Y Y have positive variances, it can be shown that ρ = 1 \rho = 1 ρ=1 (or ρ = − 1 \rho = -1 ρ=1) if and only if there exists a positive (or negative, respectively) constant c c c such that
    Y − E [ Y ] = c ( X − E [ X ] ) Y-E[Y]=c(X-E[X]) YE[Y]=c(XE[X])

Problem 20. Schwarz inequality 施瓦兹不等式
Show that for any random variables X X X and Y Y Y, we have
( E [ X Y ] ) 2 ≤ E [ X 2 ] E [ Y 2 ] (E[XY])^2\leq E[X^2]E[Y^2] (E[XY])2E[X2]E[Y2]

SOLUTION

  • We may assume that E [ Y 2 ] ≠ 0 E[Y^2]\neq 0 E[Y2]=0; otherwise, we have Y = 0 Y = 0 Y=0 with probability 1, and hence E [ X Y ] = 0 E[XY] = 0 E[XY]=0. so the inequality holds. We have
    0 ≤ E [ ( X − E [ X Y ] E [ Y 2 ] Y ) 2 ] = E [ X 2 − 2 E [ X Y ] E [ Y 2 ] X Y + ( E [ X Y ] ) 2 ( E [ Y 2 ] ) 2 Y 2 ] = E [ X 2 ] − 2 E [ X Y ] E [ Y 2 ] E [ X Y ] + ( E [ X Y ] ) 2 ( E [ Y 2 ] ) 2 E [ Y 2 ] = E [ X 2 ] − ( E [ X Y ] ) 2 E [ Y 2 ] \begin{aligned}0&\leq E[(X-\frac{E[XY]}{E[Y^2]}Y)^2]\\ &=E[X^2-2\frac{E[XY]}{E[Y^2]}XY+\frac{(E[XY])^2}{(E[Y^2])^2}Y^2] \\&=E[X^2]-2\frac{E[XY]}{E[Y^2]}E[XY]+\frac{(E[XY])^2}{(E[Y^2])^2}E[Y^2] \\&=E[X^2]-\frac{(E[XY])^2}{E[Y^2]} \end{aligned} 0E[(XE[Y2]E[XY]Y)2]=E[X22E[Y2]E[XY]XY+(E[Y2])2(E[XY])2Y2]=E[X2]2E[Y2]E[XY]E[XY]+(E[Y2])2(E[XY])2E[Y2]=E[X2]E[Y2](E[XY])2i.e., ( E [ X Y ] ) 2 ≤ E [ X 2 ] E [ Y 2 ] (E[XY])^2\leq E[X^2]E[Y^2] (E[XY])2E[X2]E[Y2]

Problem 21. Correlation coefficient.
Consider the correlation coefficient
ρ ( X , Y ) = c o v ( X , Y ) v a r ( X ) v a r ( Y ) \rho(X, Y) =\frac{cov(X, Y)}{\sqrt{var(X) var(Y)}} ρ(X,Y)=var(X)var(Y) cov(X,Y)of two random variables X X X and Y Y Y that have positive variances. Show that:

  • ( a ) (a) (a) ∣ ρ ( X , Y ) ∣ ≤ 1 |\rho(X, Y)|\leq1 ρ(X,Y)1.
    [Hint: Use the Schwarz inequality from the preceding problem.]
  • ( b ) (b) (b) If Y − E [ Y ] Y - E[Y] YE[Y] is a positive (or negative) multiple of X − E [ X ] X - E[X] XE[X], then ρ ( X , Y ) = 1 \rho(X, Y) = 1 ρ(X,Y)=1 [or ρ ( X . Y ) = − 1 \rho(X. Y) = -1 ρ(X.Y)=1, respectively].
  • ( c ) (c) (c) If ρ ( X , Y ) = 1 \rho(X, Y) = 1 ρ(X,Y)=1 [or ρ ( X , Y ) = − 1 \rho(X, Y) = -1 ρ(X,Y)=1], then, with probability 1, Y − E [ Y ] Y - E[Y] YE[Y] is a positive (or negative. respectively) multiple of X − E [ X ] X - E[X] XE[X].

SOLUTION

  • ( a ) (a) (a) Let X ~ = X − E [ X ] \tilde X = X - E[X] X~=XE[X] and Y ~ = Y − E [ Y ] \tilde Y = Y - E[Y] Y~=YE[Y]. Using the Schwarz inequality, we get
    ρ ( X , Y ) 2 = ( E [ X ~ Y ~ ] ) 2 E [ X ~ 2 ] E [ Y ~ 2 ] ≤ 1 \rho(X, Y)^2 =\frac{(E[\tilde X\tilde Y])^2}{E[\tilde X^2]E[\tilde Y^2]}\leq1 ρ(X,Y)2=E[X~2]E[Y~2](E[X~Y~])21and hence ∣ ρ ( X , Y ) ∣ ≤ 1 |\rho(X, Y)|\leq1 ρ(X,Y)1.
  • ( b ) (b) (b) If Y ~ = a X ~ \tilde Y = a\tilde X Y~=aX~, then
    ρ ( X , Y ) = E [ X ~ a X ~ ] E [ X 2 ~ ] E [ ( a X ~ ) 2 ] = a ∣ a ∣ \rho(X, Y)=\frac{E[\tilde Xa\tilde X]}{\sqrt{E[\tilde {X^2}]E[(a\tilde X)^2]}}=\frac{a}{|a|} ρ(X,Y)=E[X2~]E[(aX~)2] E[X~aX~]=aa
  • ( c ) (c) (c) If ∣ ρ ( X , Y ) ∣ = 1 |\rho(X, Y)| = 1 ρ(X,Y)=1, the calculation in the solution of Problem 20 yields
    E [ ( X ~ − E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ ) 2 ] = E [ X ~ 2 ] − ( E [ X ~ Y ~ ] ) 2 E [ Y ~ 2 ] = E [ X ~ 2 ] ( 1 − ( ρ ( X , Y ) ) 2 ) = 0 \begin{aligned}E[(\tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y)^2]&=E[\tilde X^2]-\frac{(E[\tilde X\tilde Y])^2}{E[\tilde Y^2]} \\&=E[\tilde X^2](1-(\rho(X,Y))^2) \\&=0\end{aligned} E[(X~E[Y~2]E[X~Y~]Y~)2]=E[X~2]E[Y~2](E[X~Y~])2=E[X~2](1(ρ(X,Y))2)=0Thus, with probability 1, the random variable
    X ~ − E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ \tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y X~E[Y~2]E[X~Y~]Y~is equal to zero. It follows that, with probability 1,
    X ~ = E [ X ~ Y ~ ] E [ Y ~ 2 ] Y ~ = E [ X ~ 2 ] E [ Y ~ 2 ] ρ ( X , Y ) Y ~ \tilde X=\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y=\sqrt{\frac{E[\tilde X^2]}{E[\tilde Y^2]}}\rho(X,Y)\tilde Y X~=E[Y~2]E[X~Y~]Y~=E[Y~2]E[X~2] ρ(X,Y)Y~i.e., the sign of the constant ratio of X X X and Y Y Y is determined by the sign of ρ ( X , Y ) \rho(X, Y) ρ(X,Y).

Variance of the Sum of Random Variables

  • If X 1 , X 2 , . . . , X n X_1, X_2, ... , X_n X1,X2,...,Xn are random variables with finite variance, we have
    v a r ( X 1 + X 2 ) = v a r ( X 1 ) + v a r ( X 2 ) + 2 c o v ( X 1 , X 2 ) var(X_1+X_2)=var(X_1)+var(X_2)+2cov(X_1,X_2) var(X1+X2)=var(X1)+var(X2)+2cov(X1,X2)and, more generally,
    v a r ( ∑ i = 1 n X i ) = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) var(\sum_{i=1}^nX_i)=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) var(i=1nXi)=i=1nvar(Xi)+{ (i,j)i=j}cov(Xi,Xj)

PROOF

  • For brevity, we denote X ~ i = X i − E [ X i ] \tilde X_i=X_i-E[X_i] X~i=XiE[Xi], then
    v a r ( ∑ i = 1 n X i ) = E [ ( ∑ i = 1 n X ~ i ) 2 ] = E [ ∑ i = 1 n ∑ j = 1 n X ~ i X ~ j ] = ∑ i = 1 n ∑ j = 1 n E [ X ~ i X ~ j ] = ∑ i = 1 n E [ X ~ i 2 ] + ∑ { ( i , j ) ∣ i ≠ j } E [ X ~ i X ~ j ] = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) \begin{aligned}var(\sum_{i=1}^nX_i)&=E[(\sum_{i=1}^n\tilde X_i)^2] \\&=E[\sum_{i=1}^n\sum_{j=1}^n\tilde X_i\tilde X_j] \\&=\sum_{i=1}^n\sum_{j=1}^nE[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nE[\tilde X_i^2]+\sum_{\{(i,j)|i\neq j\}}E[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \end{aligned} var(i=1nXi)=E[(i=1nX~i)2]=E[i=1nj=1nX~iX~j]=i=1nj=1nE[X~iX~j]=i=1nE[X~i2]+{ (i,j)i=j}E[X~iX~j]=i=1nvar(Xi)+{ (i,j)i=j}cov(Xi,Xj)

Example 4.15.
n n n people throw their hats in a box and then pick a hat at random. Let us find the variance of X X X, the number of people who pick their own hat.

SOLUTION

  • We have
    X = X 1 + ⋅ ⋅ ⋅ + X n X = X_1 +· · ·+ X_n X=X1++Xnwhere X 1 X_1 X1 is the random variable that takes the value 1 1 1 if the i i ith person selects his/her own hat, and takes the value 0 0 0 otherwise. Noting that X i X_i Xi is Bernoulli with parameter p = P ( X i = 1 ) = 1 / n p = P(X_i = 1) = 1/n p=P(Xi=1)=1/n, we obtain
    E [ X i ] = 1 n v a r ( X i ) = 1 n ( 1 − 1 n ) \begin{aligned}E[X_i]&=\frac{1}{n}\\ var(X_i)&=\frac{1}{n}(1-\frac{1}{n})\end{aligned} E[Xi]var(Xi)=n1=n1(1n1)For i ≠ j i \neq j i=j, we have
    c o v ( X i , X j ) = E [ X i X j ] − E [ X i ] E [ X j ] = P ( X i = 1   a n d   X j = 1 ) − 1 n 2 = P ( X i = 1 ) P ( X j = 1 ∣ X i = 1 ) − 1 n 2 = 1 n ⋅ 1 n − 1 − 1 n 2 = 1 n 2 ( n − 1 ) \begin{aligned}cov(X_i, X_j) &= E[X_iX_j] - E[X_i] E[ X_j] \\&= P(X_i = 1\ and\ X_j = 1)-\frac{1}{n^2} \\&=P(X_i=1)P(X_j=1|X_i=1)-\frac{1}{n^2} \\&=\frac{1}{n}\cdot\frac{1}{n-1}-\frac{1}{n^2} \\&=\frac{1}{n^2(n-1)} \end{aligned} cov(Xi,Xj)=E[XiXj]E[Xi]E[Xj]=P(Xi=1 and Xj=1)n21=P(Xi=1)P(Xj=1Xi=1)n21=n1n11n21=n2(n1)1Therefore,
    v a r ( X ) = v a r ( ∑ i = 1 n X i ) = ∑ i = 1 n v a r ( X i ) + ∑ { ( i , j ) ∣ i ≠ j } c o v ( X i , X j ) = n ⋅ 1 n ( 1 − 1 n ) + n ( n − 1 ) ⋅ 1 n 2 ( n − 1 ) = 1 \begin{aligned}var(X)&=var(\sum_{i=1}^nX_i) \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \\&=n\cdot \frac{1}{n}(1-\frac{1}{n})+n(n-1)\cdot \frac{1}{n^2(n-1)} \\&=1 \end{aligned} var(X)=var(i=1nXi)=i=1nvar(Xi)+{ (i,j)i=j}cov(Xi,Xj)=nn1(1n1)+n(n1)n2(n1)1=1

猜你喜欢

转载自blog.csdn.net/weixin_42437114/article/details/113832214