Chapter 5 (Limit Theorems): The Weak Law of Large Numbers (弱大数定律)

本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记

The Weak Law of Large Numbers

  • The weak law of large numbers asserts that the sample mean of a large number of independent identically distributed random variables is very likely to be close to the true mean.

  • We consider a sequence X 1 , X 2 . . . X_1, X_2 ... X1,X2... of independent identically distributed random variables with mean μ μ μ and variance σ 2 \sigma^2 σ2 , and define the sample mean by
    M n = X 1 + . . . + X n n M_n=\frac{X_1+...+X_n}{n} Mn=nX1+...+XnWe have
    E [ M n ] = μ ,      v a r ( M n ) = σ 2 n E[M_n]=\mu,\ \ \ \ var(M_n)=\frac{\sigma^2}{n} E[Mn]=μ,    var(Mn)=nσ2
  • We apply the Chebyshev inequality and obtain
    P ( ∣ M n − μ ∣ ≥ ϵ ) ≤ σ 2 n ϵ 2 ,       f o r   a n y   ϵ > 0 P(|M_n-\mu|\geq\epsilon)\leq\frac{\sigma^2}{n\epsilon^2},\ \ \ \ \ for\ any\ \epsilon>0 P(Mnμϵ)nϵ2σ2,     for any ϵ>0We observe that for any fixed ϵ > 0 \epsilon > 0 ϵ>0, the right-hand side of this inequality goes to zero as n n n increases.

在这里插入图片描述

  • It turns out that this law remains true even if the X i X_i Xi have infinite variance, but a much more elaborate argument is needed, which we omit. The only assumption needed is that E [ X i ] E[X_i] E[Xi] is well-defined.

Example 5.5. Polling. (选举问题)

  • Let p p p be the fraction of voters who support a particular candidate for office. We interview n n n “randomly selected” voters and record M n M_n Mn, the fraction of them that support the candidate. We view M n M_n Mn as our estimate of p p p and would like to investigate its properties.
  • We interpret “randomly selected” to mean that the n n n voters are chosen independently and uniformly from the given population. Thus, the reply of each person interviewed can be viewed as an independent Bernoulli random variable X i X_i Xi with success probability p p p and variance σ 2 = p ( 1 − p ) \sigma^2= p(1 - p) σ2=p(1p). The Chebyshev inequality yields
    P ( ∣ M n − p ∣ ≥ ϵ ) ≤ p ( 1 − p ) n ϵ 2 P(|M_n-p|\geq\epsilon)\leq\frac{p(1-p)}{n\epsilon^2} P(Mnpϵ)nϵ2p(1p)The true value of the parameter p p p is assumed to be unknown. On the other hand, it may be verified that σ 2 = p ( 1 − p ) ≤ 1 / 4 \sigma^2=p(1 - p)\leq1/4 σ2=p(1p)1/4 (cf. Example 5.3), which yields
    P ( ∣ M n − p ∣ ≥ ϵ ) ≤ 1 4 n ϵ 2 P(|M_n-p|\geq\epsilon)\leq\frac{1}{4n\epsilon^2} P(Mnpϵ)4nϵ21
  • Suppose now that we impose some tight specifications on our poll. We would like to have high confidence (probability at least 95%) that our estimate will be very accurate (within .01 of p p p). How many voters should be sampled?
    • The only guarantee that we have at this point is the inequality
      P ( ∣ M n − p ∣ ≥ 0.01 ) ≤ 1 4 n ( 0.01 ) 2 P(|M_n-p|\geq0.01)\leq\frac{1}{4n(0.01)^2} P(Mnp0.01)4n(0.01)21We will be sure to satisfy the above specifications if we choose n n n large enough so that
      1 4 n ( 0.01 ) 2 ≤ 0.05 \frac{1}{4n(0.01)^2}\leq 0.05 4n(0.01)210.05which yields n ≥ 50 , 000 n\geq50,000 n50,000.
    • This turns out to be fairly conservative, because it is based on the rather loose Chebyshev inequality.

猜你喜欢

转载自blog.csdn.net/weixin_42437114/article/details/113947816