UA MATH567 Special Topics on High-Dimensional Statistics 0 Why do we need high-dimensional statistical theory? ——High-dimensional effects of covariance estimation and Marcenko-Pastur rule

UA MATH567 Special Topics on High-Dimensional Statistics 0 Why do we need high-dimensional statistical theory? ——High-dimensional effects of covariance estimation and Marcenko-Pastur rule

In the last lecture, we introduced that in the experiment, the discriminant error of linear discriminant analysis will increase with the increase of the dimension, while the classical multivariate statistical theory believes that the theoretical error is a constant independent of the dimension, so we are inspired that we need to establish Adapt to the theory of high-dimensional statistical problems. In this lecture, we will discuss the phenomenon of covariance estimation in high-dimensional problems from the perspective of covariance estimation in linear discriminant analysis, which is different from classical multivariate statistical theory.

We assume x 1,…, xn x_1,\cdots,x_nx1,,xnIs some ddFor a sample with d -dimensional zero mean distribution, the sample covariance is
Σ ^ = 1 n ∑ i = 1 nxixi T \hat \Sigma = \frac{1}{n} \sum_{i=1}^n x_ix_i^TΣ^=n1i=1nxixiT

It is an unbiased estimate of the overall covariance. But for non-asymptotic situations, we want to know the error of this estimate. In the random matrix theory, we introduced some commonly used matrix norms, which can be used to represent the error. For example, in this covariance estimation problem, we define the estimation error as the operator of the difference between the sample covariance and the overall covariance Norm , namely
∥ Σ ^ − Σ ∥ = λ 1 (Σ ^ − Σ) \left\| \hat \Sigma-\Sigma \right\| = \lambda_{1}(\hat \Sigma-\Sigma)Σ^Σ=λ1(Σ^Σ )

Consider the simplest case, if Σ = I d \Sigma=I_dΣ=Id, Then according to the law of weak large numbers, Σ ^ \hat \SigmaΣ^ Will approachI d I_daccording to probabilityId, Then Σ ^ \hat \SigmaΣAll eigenvalues ​​of ^ will converge to 1 with probability.

Marcenko-Pastur rule
assumes d / n → α ∈ (0, 1) d/n \to \alpha \in (0,1)d/na(0,1 ) , that is, when the dimensionality is very high, the Marcenko-Pastur rule holds thatΣ ^ \hat \SigmaΣThe density of eigenvalues ​​of ^ satisfies:
f MP (λ) ∝ (tmax (α) − λ) (λ − tmin (α)) λ f_{MP}(\lambda) \propto \sqrt{\frac{(t_{ max}(\alpha)-\lambda)(\lambda-t_{min}(\alpha))}{\lambda}}fMP( λ )λ(tmax( a )λ ) ( λtm i n( A ) )

Where
tmin (α) = (1 − α) 2, tmax = (1 + α) 2 t_(min)(\alpha)=(1-\sqrt(\alpha))^2,\ t_(max)=( 1+\sqrt{\alpha})^2tm i n( a )=(1a )2, tmax=(1+a )2

The source of these two thresholds is the inequality
P (λ 1 (Σ ^) ≥ (1 + d / n + δ) 2) ≤ e − n δ 2 2, ∀ δ ≥ 0 P( \lambda_{1}(\hat \Sigma) \ge (1+\sqrt{d/n}+\delta)^2) \le e^{-\frac{n\delta^2}{2}}, \forall \delta \ge 0P ( λ1(Σ^)(1+d/n +d )2)e2nδ2,δ0

Insert picture description here

This picture is based on the simulation of this simple situation. The parameters of the left picture are α = 0.2, n = 4000 \alpha=0.2,n=4000a=0.2,n=4 0 0 0 ; the parameters on the right areα = 0.5, n = 4000 \alpha=0.5,n=4000a=0.5,n=4 0 0 0 ; the gray part is the frequency histogram of the eigenvalues, and the black solid line is the density of Marcenko-Pastur rule. It can be seen from this figure that the simulation result, that is, the gray part is not close to the result of classical multivariate statistics (converges to 1), but basically conforms to the Marcenko-Pastur rule, and the Marcenko-Pastur rule is a typical high-dimensional statistics. Theoretical results.


As the end of Topic 0, I will briefly explain my understanding of classical multivariate statistical theory and high-dimensional statistical theory. First of all, the two are regarded as statistical theories, and the research issues are actually the same, such as the consistency, error, and convergence rate of estimators. But the classical multivariate statistical theory assumes that d <<n d<<nd<<n , that is, when doing asymptotic analysis, classical statistics consider the feature dimensionddd Regarding sample sizennn is an infinitesimal amount, that is,d / n → 0 d/n \to 0d/n0 , so the error, concentration inequality and other results of classical statistical theory have nothing to do with the dimension. In high-dimensional statistical theory, supposed / n → α ∈ (0, 1) d/n \to \alpha \in (0,1)d/na(0,1 ) This ratio will appear in the error, concentration inequality and other results, that is, the dimensionality has an impact on the probability distribution and error.

In addition, there is another important difference between classical statistics and high-dimensional statistics. In high-dimensional statistics, information is sparse in features, which is not ddAll d features are equally important, and the proportion of important features is very small. This feature is called sparsity, and the number of important features is usually considered to beo (d) o(d)o ( d ) , which is aboutddd is an infinitesimal amount, so we always need some techniques to do dimensional reduction/feature selection to remove redundant information and improve computational efficiency.

Guess you like

Origin blog.csdn.net/weixin_44207974/article/details/112915262