Chapter 5 (Limit Theorems): The Strong Law of Large Numbers (强大数定律)

本文为 I n t r o d u c t i o n Introduction Introduction t o to to P r o b a b i l i t y Probability Probability 的读书笔记

The Strong Law of Large Numbers

在这里插入图片描述

  • The strong law of large numbers is similar to the weak law in that it also deals with the convergence of the sample mean to the true mean. It is different, however, because it refers to another type of convergence.
    • The weak law states that the probability P ( ∣ M n − μ ∣ ≥ ϵ ) P(|M_n - μ|\geq\epsilon) P(Mnμϵ) of a significant deviation of M n M_n Mn from μ μ μ goes to zero as n → ∞ n \rightarrow \infty n. Still, for any finite n n n, this probability can be positive and it is conceivable that once in a while, even if infrequently, M n M_n Mn deviates significantly from μ μ μ. The weak law provides no conclusive information on the number of such deviations, but the strong law does.
    • According to the strong law, and with probability 1 1 1, M n M_n Mn converges to μ μ μ. This implies that for any given ϵ > 0 \epsilon > 0 ϵ>0, the probability that the difference ∣ M n − μ ∣ |M_n - μ| Mnμ will exceed ϵ \epsilon ϵ an infinite number of times is equal to zero.

Example 5.13. Probabilities and Frequencies.

  • Consider an event A A A defined in terms of some probabilistic experiment. We consider a sequence of independent repetitions of the same experiment. and let M n M_n Mn be the fraction of the first n n n repetitions in which A A A occurs.
  • The strong law of large numbers asserts that M n M_n Mn converges to P ( A ) P(A) P(A), with probability 1.
  • In contrast, the weak law of large numbers asserts that M n M_n Mn converges to P ( A ) P(A) P(A) in probability (依概率收敛).
  • We have often talked intuitively about the probability of an event A A A as the frequency with which it occurs in an infinitely long sequence of independent trials. The strong law backs this intuition and establishes that the long-term frequency of occurrence of A A A is indeed equal to P ( A ) P(A) P(A), with essential certainty (the probability of this happening is 1).

Problem 18. The strong law of large numbers.
Let X 1 , X 2 , . . . X_1 , X_2 , ... X1,X2,... be a sequence of independent identically distributed random variables and assume that E [ X i 4 ] < ∞ E[X_i^4] <\infty E[Xi4]<. Prove the strong law of large numbers.

SOLUTION

  • We note that the assumption E [ X i 4 ] < ∞ E[X_i^4] <\infty E[Xi4]< implies that the expected value of the X i X_i Xi is finite. Indeed, using the inequality ∣ x ∣ ≤ 1 + x 4 |x|\leq1+x^4 x1+x4, we have
    E [ ∣ X i ∣ ] ≤ 1 + E [ X i 4 ] < ∞ E[|X_i|]\leq1+E[X_i^4]<\infty E[Xi]1+E[Xi4]<
  • Let us assume first that E [ X i ] = 0 E[X_i] = 0 E[Xi]=0. We will show that
    E [ ∑ n = 1 ∞ ( X 1 + . . . + X n ) 4 n 4 ] < ∞ E\bigg[\sum_{n=1}^\infty\frac{(X_1+...+X_n)^4}{n^4}\bigg]<\infty E[n=1n4(X1+...+Xn)4]<We have
    E [ ∑ n = 1 ∞ ( X 1 + . . . + X n ) 4 n 4 ] = 1 n 4 ∑ i 1 = 1 n ∑ i 2 = 1 n ∑ i 3 = 1 n ∑ i 4 = 1 n E [ X i 1 X i 2 X i 3 X i 4 ] E\bigg[\sum_{n=1}^\infty\frac{(X_1+...+X_n)^4}{n^4}\bigg]=\frac{1}{n^4}\sum_{i_1=1}^n\sum_{i_2=1}^n\sum_{i_3=1}^n\sum_{i_4=1}^nE[X_{i1}X_{i2}X_{i3}X_{i4}] E[n=1n4(X1+...+Xn)4]=n41i1=1ni2=1ni3=1ni4=1nE[Xi1Xi2Xi3Xi4]Let us consider the various terms in this sum. Since E [ X i ] = 0 E[X_i]=0 E[Xi]=0, if one of the indices is different from all of the other indices, the corresponding term is equal to zero. Therefore, the nonzero terms in the above sum are either of the form E [ X i 4 ] E[X_i^4] E[Xi4] (there are n n n such terms), or of the form E [ X i 2 X j 2 ] E[X_i^2X_j^2] E[Xi2Xj2], with i ≠ j i\neq j i=j. Let us count how many terms there are of this form. Such terms are obtained in three different ways: by setting i 1 = i 2 ≠ i 3 = i 4 i_1= i_2\neq i_3 = i_4 i1=i2=i3=i4, or by setting i 1 = i 3 ≠ i 2 = i 4 i_1= i_3\neq i_2 = i_4 i1=i3=i2=i4 or by setting i 1 = i 4 ≠ i 2 = i 3 i_1= i_4\neq i_2 = i_3 i1=i4=i2=i3. For each one of these three ways, we have n n n choices for the first pair of indices, and n − 1 n - 1 n1 choices for the second pair. We conclude that there are 3 n ( n − 1 ) 3n(n - 1) 3n(n1) terms of this type. Thus,
    E [ ( X 1 + . . . + X n ) 4 ] = n E [ X 1 4 ] + 3 n ( n − 1 ) E [ X 1 2 X 2 2 ] \begin{aligned}E[(X_1+...+X_n)^4]&=nE[X_1^4]+3n(n-1)E[X_1^2X_2^2]\end{aligned} E[(X1+...+Xn)4]=nE[X14]+3n(n1)E[X12X22]Using the inequality x y ≤ ( x 2 + y 2 ) / 2 xy\leq(x^2 +y^2) /2 xy(x2+y2)/2, we obtain E [ X 1 2 X 2 2 ] ≤ E [ X 1 4 ] E[ X_1^2X_2^2]\leq E[X_1^4] E[X12X22]E[X14], and
    E [ ( X 1 + . . . + X n ) 4 ] ≤ ( n + 3 n ( n − 1 ) ) E [ X 1 4 ] ≤ 3 n 2 E [ X 1 4 ] \begin{aligned}E[(X_1+...+X_n)^4]&\leq (n+3n(n-1))E[X_1^4]\leq 3n^2E[X_1^4]\end{aligned} E[(X1+...+Xn)4](n+3n(n1))E[X14]3n2E[X14]It follows that
    E [ ∑ n = 1 ∞ ( X 1 + . . . + X n ) 4 n 4 ] = ∑ n = 1 ∞ 1 n 4 E [ ( X 1 + . . . + X n ) 4 ] ≤ ∑ n = 1 ∞ 3 n 2 E [ X 1 4 ] < ∞ \begin{aligned}E\bigg[\sum_{n=1}^\infty\frac{(X_1+...+X_n)^4}{n^4}\bigg]=\sum_{n=1}^\infty\frac{1}{n^4}E[(X_1+...+X_n)^4] \leq\sum_{n=1}^\infty\frac{3}{n^2}E[X_1^4]<\infty\end{aligned} E[n=1n4(X1+...+Xn)4]=n=1n41E[(X1+...+Xn)4]n=1n23E[X14]<where the last step uses the weill knoen property ∑ n = 1 ∞ n − 2 < ∞ \sum_{n=1}^\infty n^{-2}<\infty n=1n2<. This impiles that ( X 1 + . . . + X n ) 4 / n 4 (X_1+...+X_n)^4/n^4 (X1+...+Xn)4/n4 converges to zero with probability 1, and therefore, ( X 1 + ⋅ ⋅ ⋅ + X n ) / n (X_1 +· · ·+ X_n )/n (X1++Xn)/n also converges to zero with probability 1, which is the st rong law of large numbers.
  • For the more general case where the mean of the random variables X i X_i Xi is nonzero, the preceding argument establishes that ( X 1 + ⋅ ⋅ ⋅ + X n − n E [ X 1 ] ) / n (X_1 +· · ·+ X_n-nE[X_1])/n (X1++XnnE[X1])/n converges to zero, which is the same as ( X 1 + ⋅ ⋅ ⋅ + X n ) / n (X_1 +· · ·+ X_n )/n (X1++Xn)/n converging to E [ X 1 ] E[X_1] E[X1], with probability 1.

Convergence with Probability 1

在这里插入图片描述

  • A proper interpretation of this type of convergence involves a sample space consisting of infinite sequences. It is best to think of the sample space as a set of infinite sequences ( y 1 , y 2 . . . . ) (y_1, y_2 ....) (y1,y2....) of real numbers: any such sequence is a possible outcome of the experiment. Let us now consider the set A A A consisting of those sequences ( y 1 , y 2 . . . ) (y_1, y_2... ) (y1,y2...) whose long-term average is c c c. All of the probability is concentrated on this particular subset of the sample space. This does not mean that other sequences are impossible, only that they are extremely unlikely, in the sense that their total probability is zero.

Example 5.14.

  • Let X 1 , X 2 . . . . X_1, X_2 .... X1,X2.... be a sequence of independent random variables that are uniformly distributed in [ 0 , 1 ] [0, 1] [0,1], and let Y n = min ⁡ { X 1 , . . . . X n } Y_n= \min \{ X_1, .... X_n\} Yn=min{ X1,....Xn}. We wish to show that Y n Y_n Yn converges to 0, with probability 1.
  • In any execution of the experiment, the sequence Y n Y_n Yn is nonincreasing. Since this sequence is bounded below by zero, it must have a limit, which we denote by Y Y Y. Let us fix some ϵ > 0 \epsilon > 0 ϵ>0,
    P ( Y ≥ ϵ ) = P ( X 1 ≥ ϵ . . . X n ≥ ϵ ) = ( 1 − ϵ ) n P(Y\geq\epsilon)=P(X_1\geq\epsilon...X_n\geq\epsilon)=(1-\epsilon)^n P(Yϵ)=P(X1ϵ...Xnϵ)=(1ϵ)nSince this is true for all n n n, we must have
    P ( Y ≥ ϵ ) ≤ lim ⁡ n → ∞ ( 1 − ϵ ) n = 0 P(Y\geq\epsilon)\leq\lim_{n\rightarrow\infty}(1-\epsilon)^n=0 P(Yϵ)nlim(1ϵ)n=0This shows that P ( Y ≥ ϵ ) = 0 P(Y\geq\epsilon) = 0 P(Yϵ)=0, for any positive ϵ \epsilon ϵ. We conclude that P ( Y > 0 ) = 0 P(Y > 0) = 0 P(Y>0)=0, which implies that P ( Y = 0 ) = 1 P(Y = 0) = 1 P(Y=0)=1. Since Y Y Y is the limit of Y n Y_n Yn, we see that Y n Y_n Yn converges to zero with probability 1.

Problem 13.
Consider two sequences of random variables X 1 , X 2 , . . . X_1, X_2, ... X1,X2,... and Y 1 , Y 2 , . . . . Y_1, Y_2, .... Y1,Y2,.... Suppose that X n X_n Xn converges to a a a and Y n Y_n Yn converges to b b b, with probability 1. Show that X n + Y n X_n+Y_n Xn+Yn converges to a + b a+b a+b, with probability 1. Also, assuming that the random variables Y n Y_n Yn cannot be equal to zero, show that X n / Y n X_n/Y_n Xn/Yn converges to a / b a/b a/b, with probability 1 1 1.

SOLUTION

  • Let A A A (respectively, B B B) be the event that the sequence of values of the random variables X n X_n Xn (respectively, Y n Y_n Yn) does not converge to a a a (respectively, b b b). Let C C C be the event that the sequence of values of X n + Y n X_n +Y_n Xn+Yn does not converge to a + b a + b a+b and notice that C ⊂ A ∪ B C\subset A\cup B CAB. Hence,
    P ( C ) ≤ P ( A ∪ B ) ≤ P ( A ) + P ( B ) = 0 P(C)\leq P(A \cup B)\leq P(A) + P(B) = 0 P(C)P(AB)P(A)+P(B)=0Therefore, P ( C C ) = 1 P(C^C) = 1 P(CC)=1, or equivalently, X n + Y n X_n + Y_n Xn+Yn converges to a + b a + b a+b with probability 1.
  • For the convergence of X n / Y n X_n / Y_n Xn/Yn, the argument is similar.

Problem 16.
Consider a sequence Y n Y_n Yn of nonnegative random variables and suppose that
E [ ∑ n = 1 ∞ Y n ] < ∞ E\bigg[\sum_{n=1}^\infty Y_n\bigg]<\infty E[n=1Yn]<Show that Y n Y_n Yn converges to 0, with probability 1.

  • Note: This result provides a commonly used method for establishing convergence with probability 1.
  • To evaluate the expectation of ∑ n = 1 ∞ Y n \sum_{n=1}^\infty Y_n n=1Yn, one typically uses the formula
    E [ ∑ n = 1 ∞ Y n ] = ∑ n = 1 ∞ E [ Y n ] E\bigg[\sum_{n=1}^\infty Y_n\bigg]=\sum_{n=1}^\infty E[Y_n] E[n=1Yn]=n=1E[Yn]The fact that the expectation and the infinite summation can be interchanged, for the case of nonnegative random variables, is known as the monotone convergence theorem (单调收敛定理), a fundamental result of probability theory, whose proof lies beyond the scope of this text.

SOLUTION

  • We note that the infinite sum ∑ n = 1 ∞ Y n \sum_{n=1}^\infty Y_n n=1Yn must be finite, with probability 1. Indeed, if it had a positive probability of being infinite, then its expectation would also be infinite. But if the sum of the values of the random variables Y n Y_n Yn is finite, the sequence of these values must converge to zero. Since the probability of this event is equal to 1, it follows that the sequence Y n Y_n Yn converges to zero, with probability 1.

Problem 17.
Consider a sequence of Bernoulli random variables X n X_n Xn, and let p n = P ( X n = 1 ) p_n =P(X_n=1) pn=P(Xn=1) be the probability of success in the n n nth trial. Assuming that ∑ n = 1 ∞ p n < ∞ \sum_{n=1}^\infty p_n<\infty n=1pn<, show that the number of successes is finite, with probability 1. [Compare with Problem 48(b)]

SOLUTION

  • Using the monotone convergence theorem (see above note), we have
    E [ ∑ n = 1 ∞ X n ] = ∑ n = 1 ∞ E [ X n ] = ∑ n = 1 ∞ p n < ∞ E\bigg[\sum_{n=1}^\infty X_n\bigg]=\sum_{n=1}^\infty E[X_n]=\sum_{n=1}^\infty p_n<\infty E[n=1Xn]=n=1E[Xn]=n=1pn<This implies that
    ∑ n = 1 ∞ X n < ∞ \sum_{n=1}^\infty X_n<\infty n=1Xn<with probability 1.

“Convergence with probability 1” VS “Convergence in probability”

  • Convergence with probability 1 implies convergence in probability, but the converse is not necessarily true.

Problem 15.
Suppose that a sequence Y 1 , Y 2 , . . . Y_1, Y_2, ... Y1,Y2,... of random variables converges to a real number c c c, with probability 1. Show that the sequence also converges to c c c in probability.

SOLUTION

  • Let C C C be the event that the sequence of values of the random variables Y n Y_n Yn converges to c c c. By assumption, we have P ( C ) = 1 P(C) = 1 P(C)=1. Fix some ϵ > 0 \epsilon > 0 ϵ>0, and let A k A_k Ak be the event that ∣ Y n − c ∣ < ϵ | Y_n - c| < \epsilon Ync<ϵ for every n ≥ k n\geq k nk. If the sequence of values of the random variables Y n Y_n Yn converges to c c c, then there must exist some k k k such that for every n ≥ k n\geq k nk, this sequence of values is within less than ϵ \epsilon ϵ from c c c. Therefore, every element of C C C belongs to A k A_k Ak for some k k k, or
    C ⊂ ⋃ k = 1 ∞ A k C\subset\bigcup_{k=1}^\infty A_k Ck=1AkNote also that the sequence of events A k A_k Ak is monotonically increasing, in the sense that A k ⊂ A k + 1 A_k\subset A_{k+1} AkAk+1 for all k k k. Finally, note that the event A k A_k Ak is a subset of the event { ∣ Y k − c ∣ < ϵ } \{|Y_k - c|< \epsilon\} { Ykc<ϵ}. Therefore,
    lim ⁡ k → ∞ P ( ∣ Y k − c ∣ < ϵ ) ≥ lim ⁡ k → ∞ P ( A k ) = P ( ⋃ k = 1 ∞ A k ) ≥ P ( C ) = 1 \lim_{k\rightarrow\infty}P(|Y_k-c|<\epsilon)\geq\lim_{k\rightarrow\infty}P(A_k)=P(\bigcup_{k=1}^\infty A_k)\geq P(C)=1 klimP(Ykc<ϵ)klimP(Ak)=P(k=1Ak)P(C)=1It follows that
    lim ⁡ k → ∞ P ( ∣ Y k − c ∣ ≥ ϵ ) = 0 \lim_{k\rightarrow\infty}P(|Y_k-c|\geq\epsilon)=0 klimP(Ykcϵ)=0

  • Our last example illustrates the difference between convergence in probability and convergence with probability 1.

Example 5.15.

  • Consider a discrete-time arrival process. The set of times is partitioned into consecutive intervals of the form I k = { 2 k , 2 k + 1 , . . . , 2 k + 1 − 1 } I_k = \{ 2^k, 2^{k}+1, ... , 2^{k+ 1} - 1\} Ik={ 2k,2k+1,...,2k+11}. Note that the length of I k I_k Ik is 2 k 2^k 2k. During each interval I k I_k Ik, there is exactly one arrival, and all times within an interval are equally likely. The arrival times within different intervals are assumed to be independent. Let us define Y n = 1 Y_n = 1 Yn=1 if there is an arrival at time n n n, and Y n = 0 Y_n = 0 Yn=0 if there is no arrival. We have P ( Y n = 1 ) = 1 / 2 k P(Y_n =1) = 1/2^k P(Yn=1)=1/2k, if n ∈ l k n \in l_k nlk. Note that as n n n increases, It belongs to intervals I k I_k Ik with increasingly large indices k k k. Consequently,
    lim ⁡ n → ∞ P ( Y n ≠ 0 ) = lim ⁡ k → ∞ 1 2 k = 0 \lim_{n\rightarrow\infty}P(Y_n\neq0)=\lim_{k\rightarrow\infty}\frac{1}{2^k}=0 nlimP(Yn=0)=klim2k1=0and we conclude that Y n Y_n Yn converges to 0 in probability.
  • However, when we carry out the experiment, the total number of arrivals is infinite (one arrival during each interval I k I_k Ik). Therefore, Y n Y_n Yn is unity for infinitely many values of n n n, the event { lim ⁡ n → ∞ Y n = 0 } \{\lim_{n\rightarrow\infty}Y_n=0\} { limnYn=0} has zero probability, and we do not have convergence with prob­ability 1.
  • Intuitively, the following is happening. At any given time, there is only a small (and diminishing with n n n) probability of a substantial deviation from 0, which implies convergence in probability. On the other hand. given enough time, a substantial deviation from 0 is certain to occur and for this reason, we do not have convergence with probability 1.

猜你喜欢

转载自blog.csdn.net/weixin_42437114/article/details/113995003