Chapter 4 (Further Topics on Random Variables): Covariance and Correlation (协方差和相关)

其他 2021-04-05 20:32:47 阅读次数: 0

本文为 $I n t r o d u c t i o n$ $t o$ $P r o b a b i l i t y$ 的读书笔记

目录

Covariance
Variance of the Sum of Random Variables

Covariance

The covariance of two random variables $X$ and $Y$ , denoted by $c o v (X, Y)$ , is defined by
$\begin{aligned}cov(X,Y)&=E[(X-E[X])(Y-E[Y])] \\&=E[XY]-E[X]E[Y] \end{aligned}$
When $c o v (X, Y) = 0$ , we say that $X$ and $Y$ are uncorrelated. Roughly speaking, a positive or negative covariance indicates that the values of $X - E [X]$ and $Y - E [Y]$ obtained in a single experiment “tend” to have the same or the opposite sign, respectively. Thus the sign of the covariance provides an important qualitative indicator of the relationship between $X$ and $Y$ .

We record a few properties of covariances that are easily derived from the definition: for any random variables $X$ , $Y$ , and $Z$ , and any scalars $a$ and $b$ , we have
$\\cov(X,aY+b)=a\cdot cov(X,Y) \\cov(X,Y+Z)=cov(X,Y)+cov(X,Z)$
Assume that $X$ and $Y$ satisfy
$E[X|Y=y]=E[X],\ \ \ \ for\ all\ y$ Then, assuming $X$ and $Y$ are discrete, the total expectation theorem implies that
$\begin{aligned}E[XY]&=\sum_yp_Y(y)E[XY|Y=y]=\sum_yyp_Y(y)E[X|Y=y]\\ &=\sum_yyp_Y(y)E[X]=E[X]E[Y]\end{aligned}$ so $X$ and $Y$ are uncorrelated. The argument for the continuous case is similar.
Note thate if $X$ and $Y$ are independent, we have $c o v (X, Y) = E [X Y] - E [X] E [Y] = 0$ . Thus, if $X$ and $Y$ are independent, they are also uncorrelated. However, the converse is generally not true, as illustrated by the following example.

Example 4.13.

The pair of random variables $(X, Y)$ takes the values $(1, 0), (0, 1), (- 1, 0)$ and $(0, - 1)$ . each with probability $1 / 4$ . Therefore,
$c o v (X, Y) = E [X Y] - E [X] E [Y] = 0 - 0 = 0$ and $X$ and $Y$ are uncorrelated. However, $X$ and $Y$ are not independent since, for example, a nonzero value of $X$ fixes the value of $Y$ to zero.

The correlation coefficient (相关系数) $\rho(X,Y)$ of two random variables $X$ and $Y$ that have nonzero variances is defined as
$\rho(X, Y) =\frac{cov(X, Y)}{\sqrt{var(X) var(Y)}}$ (The simpler notation $\rho$ will also be used when $X$ and $Y$ are clear from the context.)
It may be viewed as a normalized version of the covariance $c o v (X, Y)$ , and infact, it can be shown that $\rho$ ranges from $- 1$ to $1$ .
If $\rho>0$ (or $\rho < 0$ ), then the values of $X - E [X]$ and $Y - E [Y]$ “tend” to have the same (or opposite, respectively) sign. The size of $|\rho|$ normalized measure of the extent to which this is true. In fact, always assuming that $X$ and $Y$ have positive variances, it can be shown that $\rho = 1$ (or $\rho = -1$ ) if and only if there exists a positive (or negative, respectively) constant $c$ such that
$Y - E [Y] = c (X - E [X])$

Problem 20. Schwarz inequality 施瓦兹不等式
Show that for any random variables $X$ and $Y$ , we have
$(E[XY])^2\leq E[X^2]E[Y^2]$

SOLUTION

We may assume that $E[Y^2]\neq 0$ ; otherwise, we have $Y = 0$ with probability 1, and hence $E [X Y] = 0$ . so the inequality holds. We have
$\begin{aligned}0&\leq E[(X-\frac{E[XY]}{E[Y^2]}Y)^2]\\ &=E[X^2-2\frac{E[XY]}{E[Y^2]}XY+\frac{(E[XY])^2}{(E[Y^2])^2}Y^2] \\&=E[X^2]-2\frac{E[XY]}{E[Y^2]}E[XY]+\frac{(E[XY])^2}{(E[Y^2])^2}E[Y^2] \\&=E[X^2]-\frac{(E[XY])^2}{E[Y^2]} \end{aligned}$ i.e., $(E[XY])^2\leq E[X^2]E[Y^2]$

Problem 21. Correlation coefficient.
Consider the correlation coefficient
$\rho(X, Y) =\frac{cov(X, Y)}{\sqrt{var(X) var(Y)}}$ of two random variables $X$ and $Y$ that have positive variances. Show that:

$(a)$ $|\rho(X, Y)|\leq1$ .
[Hint: Use the Schwarz inequality from the preceding problem.]
$(b)$ If $Y - E [Y]$ is a positive (or negative) multiple of $X - E [X]$ , then $\rho(X, Y) = 1$ [or $\rho(X. Y) = -1$ , respectively].
$(c)$ If $\rho(X, Y) = 1$ [or $\rho(X, Y) = -1$ ], then, with probability 1, $Y - E [Y]$ is a positive (or negative. respectively) multiple of $X - E [X]$ .

SOLUTION

$(a)$ Let $\tilde X = X - E[X]$ and $\tilde Y = Y - E[Y]$ . Using the Schwarz inequality, we get
$\rho(X, Y)^2 =\frac{(E[\tilde X\tilde Y])^2}{E[\tilde X^2]E[\tilde Y^2]}\leq1$ and hence $|\rho(X, Y)|\leq1$ .
$(b)$ If $\tilde Y = a\tilde X$ , then
$\rho(X, Y)=\frac{E[\tilde Xa\tilde X]}{\sqrt{E[\tilde {X^2}]E[(a\tilde X)^2]}}=\frac{a}{|a|}$
$(c)$ If $|\rho(X, Y)| = 1$ , the calculation in the solution of Problem 20 yields
$\begin{aligned}E[(\tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y)^2]&=E[\tilde X^2]-\frac{(E[\tilde X\tilde Y])^2}{E[\tilde Y^2]} \\&=E[\tilde X^2](1-(\rho(X,Y))^2) \\&=0\end{aligned}$ Thus, with probability 1, the random variable
$\tilde X-\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y$ is equal to zero. It follows that, with probability 1,
$\tilde X=\frac{E[\tilde X\tilde Y]}{E[\tilde Y^2]}\tilde Y=\sqrt{\frac{E[\tilde X^2]}{E[\tilde Y^2]}}\rho(X,Y)\tilde Y$ i.e., the sign of the constant ratio of $X$ and $Y$ is determined by the sign of $\rho(X, Y)$ .

Variance of the Sum of Random Variables

If $X_1, X_2, ... , X_n$ are random variables with finite variance, we have
$var(X_1+X_2)=var(X_1)+var(X_2)+2cov(X_1,X_2)$ and, more generally,
$var(\sum_{i=1}^nX_i)=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j)$

PROOF

For brevity, we denote $\tilde X_i=X_i-E[X_i]$ , then
$\begin{aligned}var(\sum_{i=1}^nX_i)&=E[(\sum_{i=1}^n\tilde X_i)^2] \\&=E[\sum_{i=1}^n\sum_{j=1}^n\tilde X_i\tilde X_j] \\&=\sum_{i=1}^n\sum_{j=1}^nE[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nE[\tilde X_i^2]+\sum_{\{(i,j)|i\neq j\}}E[\tilde X_i\tilde X_j] \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \end{aligned}$

Example 4.15.
$n$ people throw their hats in a box and then pick a hat at random. Let us find the variance of $X$ , the number of people who pick their own hat.

SOLUTION

We have
$X = X_1 +· · ·+ X_n$ where $X_1$ is the random variable that takes the value $1$ if the $i$ th person selects his/her own hat, and takes the value $0$ otherwise. Noting that $X_i$ is Bernoulli with parameter $p = P(X_i = 1) = 1/n$ , we obtain
$\begin{aligned}E[X_i]&=\frac{1}{n}\\ var(X_i)&=\frac{1}{n}(1-\frac{1}{n})\end{aligned}$ For $\neq j$ , we have
$\begin{aligned}cov(X_i, X_j) &= E[X_iX_j] - E[X_i] E[ X_j] \\&= P(X_i = 1\ and\ X_j = 1)-\frac{1}{n^2} \\&=P(X_i=1)P(X_j=1|X_i=1)-\frac{1}{n^2} \\&=\frac{1}{n}\cdot\frac{1}{n-1}-\frac{1}{n^2} \\&=\frac{1}{n^2(n-1)} \end{aligned}$ Therefore,
$\begin{aligned}var(X)&=var(\sum_{i=1}^nX_i) \\&=\sum_{i=1}^nvar(X_i)+\sum_{\{(i,j)|i\neq j\}}cov(X_i,X_j) \\&=n\cdot \frac{1}{n}(1-\frac{1}{n})+n(n-1)\cdot \frac{1}{n^2(n-1)} \\&=1 \end{aligned}$

猜你喜欢

转载自blog.csdn.net/weixin_42437114/article/details/113832214

Chapter 4 (Further Topics on Random Variables): Covariance and Correlation (协方差和相关)

Chapter 4 (Further Topics on Random Variables): Transforms (矩母函数)

Chapter 4 (Further Topics on Random Variables): Conditional Expectation and Variance revisited

Chapter 4(Further Topics on Random Variables):Sum of a Random Number of Independent Random Variables

Chapter 4 (Further Topics on Random Variables): Derived Distributions (随机变量函数的概率密度函数)

Chapter 3 (General Random Variables): Conditioning (条件)

Chapter 2 (Discrete Random Variables): Conditioning (条件)

Chapter 2 (Discrete Random Variables): Expectation, Mean, and Variance (期望、均值、方差)

协方差（Covariance）

Chapter 3 (General Random Variables): Joint PDFs of Multiple Random Variables (联合概率密度)

Chapter 3 (General Random Variables): Normal Random Variables (正态随机变量)

Chapter 2 (Discrete Random Variables): Independence (独立性)

协方差（covariance）与方差(variance)的对比

Chapter 3 (General Random Variables): Continuous Random Variables and PDFs (连续随机变量、概率密度函数)

Chapter 2 (Discrete Random Variables): Joint PMFs of Multiple Random Variables (多个随机变量的联合分布列)

从多个角度来理解协方差（covariance）

加权协方差矩阵(weighted covariance matrix)

协方差矩阵简介（Covariance Matrix）

Chapter 3 (General Random Variables): The Continuous Bayes‘ Rule (连续贝叶斯准则)

Chapter 3 (General Random Variables): Cumulative Distribution Functions (累积分布函数)

Python calculate and plot correlation between multiple variables

Random Variables——简介

协方差分析 | ANCOVA (Analysis of Covariance)

[P/M/K] How to see correlation when variables are more or less

【SS_Matlab】1_Random Variables

PHP(4)Types - Variables

【线上问题排查】kafka producer 报fetching topic metadata with correlation id 10 for topics

Chapter03 Random类

《Pro Oracle SQL》Chapter8--8.9 Advanced topics

Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition

今日推荐

周排行

Sping整合ActiveMQ（五.常见错误分析）

jquery ajax发送请求实例模板

北风设计模式课程---24、迭代模式

[Luogu] 兽径管理

1030 Travel Plan （30 分）(dijkstra算法+dfs+边权)

springboot-shiro中的问题

数据访问安全代理 CASB

RocketMQ与Kafka对比

Rider 2019.3.3 发布，跨平台 .NET IDE

Ubuntu切换root su -

每日归档

更多

2025-03-17(0)

2025-03-16(0)

2025-03-15(0)

2025-03-14(0)

2025-03-13(0)

2025-03-12(0)

2025-03-11(0)

2025-03-10(0)

2025-03-09(0)

2025-03-08(0)