Autopilot (sixty-nine) --------- dimensionality reduction PCA, LDA

      Autopilot is a systematic project, there can be a way to conquer the world, it is composed of many sub-functions together with, which requires us to be able to understand the other machine learning algorithms, before there is some good in the visual method, here think sorted out, for your reference.

       Prior to deep learning, feature extraction is a very important technology, in the middle of principal component analysis (PCA) frequent theme model algorithm (LDA) is similar, where the system's sum up:

1. Principal Component Analysis (PCA)

      In the era of big data, most is the lack of data, data reduction is the most important way of data processing, data dimensionality reduction, while simplifying the analysis, hoping to reduce the information loss caused by dimensionality reduction, how to achieve this goal? Here I introduce a method of principal component analysis.

       PCA main idea is to map n-dimensional feature on the k-dimensional, which is a new k-dimensional orthogonal features are also referred to as main component. The question is how to select transformed K-dimensional coordinates so as to minimize the loss of information in the signal processing said that the signal has a large variance, noise has a smaller variance, the variance of the SNR is the signal to noise ratio, the bigger the better. Projection on a larger sample variance of u1, u2 smaller projection on the variance, may be deemed a projection on u2 is caused by noise. Therefore, we believe that the best k-dimensional feature is to convert n-dimensional point of the k-dimensional sample, the sample variance of each dimension are great.

                                    

       1. Introduced in front of maximum variance in the one-dimensional data, we seek good variance in the cube need to use covariance:

           In which the two-dimensional covariance:    

           Three-dimensional covariance:

       2. After the introduction to the concept of variance, the next problem to be solved is how to build the axes are orthogonal to each, and use this knowledge to linear algebra.

           If a vector v is the eigenvector matrix A, it must be expressed as the following form:

           λ v is the eigenvector corresponding to the eigenvalue, an eigenvector of a matrix set is a set of orthogonal vectors.

           For the matrices A, there is a set of feature vector v, the set of orthogonal vectors of unitizing can get a set of orthogonal unit vectors. Eigenvalue decomposition of the matrix A is divided into the following formula: wherein, Q is a matrix of eigenvectors of A composition, \Sigmais a diagonal matrix element is the characteristic values on the diagonal.

        3. Based on eigenvalue decomposition of the covariance matrix to achieve PCA algorithm steps as follows: the data set , k requires reduced maintenance.

  • 3.1 to the average (i.e., to the center), i.e., subtracting each average value of each feature, such that desired data is zero.
  • 3.2 covariance matrix , Note: this addition or not in addition to the number of samples n or n-1, in fact, has no effect on the obtained feature vector.
  • 3.3 with the eigenvalue decomposition method Covariance Matrix eigenvalues and eigenvectors.
  • 3.4 pairs of eigenvalues ​​in descending order, selecting the k largest. Then the k eigenvectors corresponding to each row vector containing a matrix of eigenvectors P.
  • 3.5 to transfer data to the new space constructed k eigenvectors, i.e. Y = PX. Thereby realizing the data dimension reduction from n to k-dimensional.

2. Topic Model (LDA)

        Topic model is a statistical model used to discover abstract topics in a series of documents. Intuitively, if an article has a central idea, then there must be more frequently certain words occur.

       This process is like us in writing: Generally, we give all the time writing their own articles set a theme, then we will be based on this theme, and this combined with a theme related to the final form of words article.

       LDA actually has two meanings, one is the linear discriminant analysis, do reduce the dimension of data for subsequent classification; another Latent Dirichlet Allocation (Latent Dirichlet Allocation, referred to as LDA), is a probabilistic topic model .

       To begin, and to random assignment (all d and t), corresponding to a priori value. then:

  • 1. ds for a specific document in the i-th word wi, so if the topic corresponding to a word is tj, the above equation can be rewritten as Pj (wi | ds) = P (wi | tj) * P (tj | ds)
  • 2. T enumeration in the topic, get all the Pj (wi | ds). Wi can then select a topic for the ds the i th word according to the probability values ​​of these results. The easiest way is to take orders Pj (wi | ds) largest tj (Note that this only j-style house is variable)
  • 3. Then, if ds in the i-th word wi here with the original chosen a different topic (that is, all this time i traversing ds words, rather tj which should be the same), and will be on impact, and they in turn will affect the impact of the above-mentioned P | calculated (w d) of. All in all w D d is once P (w | d) calculating and reselect topic seen iteration. After such loop iteration n times, it will converge to a desired result LDA

       DETAILED derived as follows:

        1. conjugate and conjugate prior distribution: Bayer Spikes in front of said posterior probability × α priori probability likelihood function. If, prior probability and posterior probability is the same in form, it can be said that the relationship between the two is the probability conjugate.

        2. gamma function: this function is essentially to promote the factorial function on the real numbers. For integer, the factorial function: form such as 1 × 2 × 3 × 4 × ... n-1 for a real number, it can not be calculated using the above formula, factorial function as:

        3. binomial distribution: the value of the results for only two cases can be considered in both cases is 0-1. Wherein the probability value is 0 for p, the probability value of 1 is 1-p. Then for n independent experiments is, PS only an experiment, a Bernoulli distribution; repeated n times, is the binomial distribution.

        4. The number of distribution: binomial extended to multi-dimensional case, that the value of the results are not only two, but there are a number of possible values ​​of n.

       It is to add a layer of Bayesian framework on the basis of pLSA that the LDA is the Bayesian version of pLSA. The Bayesian framework is reflected in the Dirichlet prior distribution.

       In pLSA model front, the useless what prior distribution, directly considered to be a random process. This Bayeux Spirax those people simply can not endure, ah, ah this unscientific, there should be a priori distribution, ah, ah can not follow the routine, so this group of people plus two dice on a prior distribution parameter , this pLSA transformed into a Bayesian process. Here used prior distribution is Dirichlet distribution. According to say in front of the document generation process, you can now generate a document abstract to the following issues:

  • 1) From the Dirichlet distribution α in the sample, generating a document relating to the distribution of θ d
  • 2) a sample from a multinomial distribution θ theme, the subject generates a document d zi i-th word
  • 3) Sampling from a Dirichlet distribution β, the generated topic words corresponding to zi distribution φi
  • 4) sample from a multinomial distribution φi words, the finally generated word wi

 

 

Published 70 original articles · won praise 126 · views 850 000 +

Guess you like

Origin blog.csdn.net/zhouyy858/article/details/103831089