"04" machine learning, mathematical knowledge which deep learning needs?

Getting to avoid the pit Guide

Self three years, basically no one to lead the way, I naturally turn professional is difficult, he stepped on numerous pits, through a lot of detours. Here I put together your own stepped pit, for your reference.

1. Do not start from scratch to learn math  , if not a little math will not, you do not need to learn from scratch. Spend a month, the calculus, linear algebra, probability and statistics go over enough. I have not studied because the number is too high, so I spent half a year, or even read a mathematical analysis, functional analysis and measure theory such materials. Looking back now, in fact, most of the knowledge acquired and not used in the later algorithms career, although not really sunk costs, but the input-output ratio is not high absolutely.

Therefore, do not over into the field of mathematics, to lay a solid foundation. There is a very good example that if you want to build cars , you need to have 20 years of theoretical knowledge, technology and practice. But if you just want to drive a car , but will quickly learn. As a driver, you need to understand the principles of gasoline engines do? You do not need. Because the car you drive, there may even simply do not have access petrol engine (electric vehicles).

 

2. Code the ability to cross the border

I finished all the compulsory self-study computer science in junior year, because I know the scientific data underlying computer knowledge can not be separated. I've seen a lot of back only derivation formula, even the JVM virtual machine is nothing to know. In addition to Python, you should study at least 1-2 door underlying language, such as C / C ++, Java.

In addition, if your goal is to engineer the algorithm, the data structures and algorithms, computer systems, memory mechanisms, network programming, big data framework should also begin to learn, because you are job-oriented enterprises. This regard, I would have time to find their own internship experiences when finishing graduate school to share it.

 

3. Do not over-depth

Depth study, the underlying structure is very complex and theoretical knowledge to read, the book can also see the dancing. In addition to the part of interest, the other does not require in-depth.

 

4. Do not re-create the wheel

Whether you're doing research or engage in business algorithm, at the beginning of the entry, conscientiously implement the algorithm yourself over the underlying code base is sufficient, for more complex algorithms, such as non-essential (such as playing), or Do not waste time, keep in mind that you just started, do not need expertise in this area.

I once read hadoop of ML package source code, and xgboost C ++ source code, for me, was still getting started, do useful work. Wheels not repaired, it is shining high-speed rail to learn someone else's structure, efficiency is not too high. Now, for most depth model call, but a hundred lines of code base amount. Unless your direction is large-scale, high availability underlying the development of deep learning system architecture development, it is not necessary to further the underlying code.

 

5. Do not report training

This eyes of the beholder. But I think, open class on the network enough you learn, such as Coursera, Stanford CS231, Khan Academy, etc., which now have Chinese subtitles. Related Resources on finishing my last article, we can access. It is emphasized that, some students will feel spent the money, it will naturally feel bad, will insist on learning it. The idea is very good, but too naive, there are two

  • China's education system has just started, and many are 985 universities, the Department of Mathematics and Computer classes with teachers themselves are not likely origin of AI research. So, please think about: off-campus training institutions, teaching ability will do better than the 985?
  • Depth study did not say a crash, although the depth of learning is often criticized for not supporting the basic theory, mathematical knowledge is not required, but is said to Daniel to hear. Once you dig in a certain direction, and ultimately still the underlying mathematics. Most courses do not teach you, it is all. They will only use your interest, Arrested Development, and then harvest tuition

I wrote might have training people to report me (had been reported before), but now too many courses harvest IQ, even with the AI ​​industry have been ruined, so I would say.

 


 

Finally, I summed up the depth of learning, all will use mathematical knowledge in the field of machine learning, we can point to this knowledge as a context for learning in the development plan, if not necessary, do not spend too much time to learn outside of these

calculus

Calculus is the foundation of modern mathematics, linear algebra, matrix theory, probability theory, information theory, mathematical optimization methods courses require a knowledge of calculus. Alone machine learning and deep learning, the more used is the differential. Integrating substantially only be used in a probability theory, a probability density function and a distribution function calculation concepts must be defined or calculated by means of integration. Almost all of the learning algorithm in training or predictions are solving an optimization problem, and therefore need to rely on extreme value to solve calculus functions, function and select some models, there are also considerations of mathematical properties. For machine learning, the main role of calculus is:  1. solving extreme value function of  the nature of 2. The analytical functions  listed calculus knowledge required for machine learning and deep learning in the following, apparently, is not talking about textbooks All content is required, we list only necessary.

  • Limit: limit is higher mathematics and elementary mathematics watershed, is the cornerstone of the building of calculus is based on the concept of derivative, differentiation, integration and so on. Although not directly limit the use of knowledge in machine learning, but to understand derivatives and integration, it is a must.
  • Supremum and infimum: the concept of the calculus of engineering is strange, but in machine learning will be frequently used, we do not see the paper or the book of sup and inf do not know what that means.
  • Derivative: its importance is well known, seeking extreme value function needs it, it needs to analyze the nature of the function. The typical method of deriving gradient descent, calculate the number of the guide and the logistic function. Derivative skillfully basic function is calculated.
  • Lipschitz Continuity: This concept is also not mentioned in the engineering textbooks, but it is very useful to analyze the nature of the algorithm, are useless in GAN, the stability of deep learning algorithms, generalization performance analysis.
  • Monotonicity and derivative functions: Derivation of certain algorithms, such as the activation function neural network, the AdaBoost algorithm requires monotonic function studies.
  • Extreme value of the derivative of the function: This is in the center position in machine learning, optimization problems are most continuous optimization problem, so by taking the derivative of the function while seeking extreme value, in order to achieve the minimum loss of function point 0, the maximum However, the target of similar function and so on.
  • And irregularities of the derivative function : the projection of the, in Jensen's inequality has demonstrated its application.
  • Taylor formula: Another core knowledge points. Widely used in the optimization algorithm, the gradient descent method, Newton method, quasi-Newton method, the AdaBoost algorithm, to enhance the gradient algorithm, XGBoost derived without it.
  • Indefinite integral: relatively few points used in machine learning is mainly used to calculate the probability, which is the basis of definite integrals.
  • Definite integral: including generalized integral, is used to calculate the probability theory. A large class of machine learning algorithm is a probabilistic algorithm type, such as Bayesian classifier, probabilistic graphical models, Variational inference and the like. These places are related to the probability density function is integrated.
  • Change the upper limit of integration. A typical distribution function is the line integral function variant, also used mainly for probability calculations.
  • Newton - Leibniz formula. Rarely used directly in machine learning, but it is one of the most important calculus formula, provides the basis for the calculation of definite integrals.
  • Ordinary Differential Equations. It will be used in some papers, but generally less than algorithms.
  • Partial derivative. The importance Needless to say, the machine learning where the vast majority of functions are multi-function, asking them to extremes, partial derivatives are not open around.
  • gradient. Determine the monotonicity and multivariate function extremum, gradient descent method is derived without it. Almost all continuous optimization algorithms necessary to calculate the gradient value of the function, and to find a gradient of 0 as a target point.
  • Higher-order partial derivatives. Extremum function it can not be separated, the light can not determine the extreme values of the gradient function.
  • Chain rule. Also the use of a wide variety of back propagation neural network are dependent on the chain rule.
  • Hessian matrix. Determines the extremes and convexity of functions, the use of materials engineering students may be unfamiliar.
  • Criteria of extreme value of multivariate function. Although not directly, but it is essential to understand optimization methods.
  • Concavity Identification Methods of Function. The problem is to prove a convex optimization problem is inseparable from it.
  • Jacobian matrix. Engineering materials generally do not introduce this concept, the Hessian matrix, and the same, not difficult to understand, which simplifies the use of multiple composite equation derivation function is widely used in the back-propagation algorithm.
  • Matrix vector derivation. Common linear function, a quadratic function of the gradient, to the Hessian matrix calculated by heart, derived not complicated.
  • Taylor formula. Understanding the cornerstone of gradient descent, Newton method of optimization algorithms.
  • Multiple integrals. Mainly used in probability theory, random vector of integration, such as the normal distribution.

Linear algebra and matrix theory

With respect to the calculus, it seems a more linear algebra and matrix theory part belongs to the category / matrix analysis, beyond the scope of engineering textbooks of linear algebra. Here are common knowledge of linear algebra and matrix theory.

  • Vector and operation: Input machine learning algorithms often is a vector, the feature vector as the sample. So familiar and commonly used vector operation is to understand the basis of machine learning.
  • Matrix and its operation: the vector, as is the core concept of linear algebra, various calculations, the matrix used, must be learned by heart.
  • Determinant: direct use less, in probability theory, deduced some models of occasional use.
  • Linear equations: direct use less, but this is the core of linear algebra.
  • Eigenvalues and eigenvectors: is widely used in machine learning, a lot of problems due to finally solving matrix eigenvalues and eigenvectors. The manifold learning, spectral clustering, linear discriminant analysis, principal component analysis.
  • Eigenvalue: Engineering linear algebra materials generally do not mention the concept, but the learning in the manifold, spectral clustering algorithms it is frequently used.
  • Rayleigh quotient: Engineering textbooks generally do not mention it. In some derivation algorithm will be used, such as linear discriminant analysis.
  • And the number of spectral norm condition number of the matrix: Engineering materials generally do not mention it. In the analysis of certain algorithms will use it, it is important to characterize the properties of the matrix.
  • Quadratic: a lot of the objective function is a quadratic function, quadratic and therefore the status of self-evident.
  • Cholesky decomposition: Derivation of certain algorithms will need it, engineering materials generally do not mention it.
  • Eigenvalue Decomposition: very important to machine learning, many problems attributed to the final eigenvalue decomposition, such as principal component analysis, linear discriminant analysis.
  • Singular Value Decomposition: widely used in machine learning from the normal Bayesian classifier, the topic models, has its shadow.

Probability theory and information theory

 

In probability theory and information theory with machine learning very much. Knowledge of probability theory, is generally not beyond the scope of engineering materials. The information theory is that many students did not learn, but as long as you understand calculus and probability theory, is not difficult to understand these concepts. Listed below are commonly used in probability theory and information theory knowledge.

  • Random events and probability : It is the understanding of the basis of random variables, probability theory is the most basic knowledge.
  • Conditional probability and independence : conditional probability is important in machine learning, as long as there is a probability model places, usually without it. Independence is also used in many places, such as graphical models of probability theory.
  • Conditional independence : widely used in probability theory, graph model, we must understand it.
  • Total probability formula : basic formulas, the status of goes without saying.
  • Bayesian formula : in the soul of place in the probabilistic machine learning algorithm, the model should be used to generate almost all of it.
  • Discrete random variables and continuous random variables : the importance Needless to say, the probability mass function, probability density function, distribution function, we must master.
  • Expectation : very important, a lot of places has its shadow.
  • Variance and standard deviation : very important, important indicators characterize the probability distribution.
  • Jensen's inequality : In many derivation and proof to be used for it, such as the EM algorithm, estimation variation.
  • Common probability distributions include: uniform, normal, Bernoulli distribution, binomial distribution, multinomial distribution, t distribution, etc., it is widely used in various machine learning algorithms.
  • Random vector : Multivariate random variables, more useful in practice.
  • Covariance : a concept frequently used, such as principal component analysis, multivariate normal distribution.
  • Parameter Estimation: including maximum likelihood estimation, maximum a posteriori probability estimation, Bayesian estimation, kernel density estimation, we must figure out how they are going.
  • Randomized algorithms : including sampling, genetic algorithm, Monte Carlo algorithm, is also often used in machine learning.
  • Information theory concepts, including entropy, cross entropy, KL divergence, JS divergence, mutual information, information gain, must deeply understand these concepts. If you do not understand the KL divergence, how to understand the variational inference and VAE?

optimal method

As already mentioned, the optimization methods of machine learning is the soul of the parameters to determine or predict the results of the model. Unfortunately, professional engineering in general did not learn this course. But as long as you understand the calculus and linear algebra is not hard to deduce these algorithms. Listed below are the most commonly used optimization method knowledge:

  • Gradient Descent : The simplest optimization algorithm, but very useful, especially in depth study.
  • Stochastic gradient descent method : the importance of women and children in depth learning.
  • Steepest descent method : gradient descent method is modified, and the like to enhance the foundation for understanding the gradient algorithm.
  • Improved gradient descent : as AdaGrad, AdaDelta, Adam, etc., using the depth learning open source library when often see these names.
  • Newton's method : a typical representative of the second-order optimization algorithm, but with less depth study. In training logistic regression algorithm will need it.
  • Quasi-Newton method : Improved Newton's method, the training conditions with airports and other models will be used L-BFGS algorithms.
  • Coordinate descent method : training logistic regression model will be used it is not difficult to understand.
  • Convex Optimization : One of the core concepts of optimization, if a problem is to prove a convex optimization problem, Congratulations, it basically can better solution.
  • Lagrange multiplier method : count points are often used in a variety of derivation, such as principal component analysis, linear discriminant analysis, if unskilled grasp it, you will be very difficult.
  • KKT condition : Lagrange method to extend the version with inequality constraints, the SVM will be used in the derivation.
  • Lagrange duality : not a good understanding of knowledge, often used in SVM derivation, but sets the formula is not difficult.
  • Multi-objective optimization : rarely use, use it, such as the concepts of Pareto optimality in a multi-target the NAS.
  • Variational method: to solve extreme value functional, in some theoretical derivation will need it, as can be shown that under certain circumstances the mean and variance by variational methods, maximum entropy normal distribution.

Graph Theory

Some machine learning problems may be solved by graph theory, such as the manifold learning, spectral clustering. The expression of certain algorithms may also use knowledge of graph theory, such as the calculation of depth learning, NAS in the network topology. Probabilistic graphical model makes a lot of beginners scared, it is the perfect combination of graph theory and probability theory. The following describes the common knowledge of graph theory. FIG basic concept of: as vertices, edges, directed graph, an undirected graph like.

Weighted adjacency matrix with the matrix : the core concept in graph theory, generally with the right side of the counterweight.

Some special chart : as a bipartite graph, directed acyclic graph, etc., in the depth of learning is often used them.

The shortest path problem : the classic Dijkstra algorithm is that each programmer must master.

Laplace matrix and normalized Laplacian matrix : more difficult to understand the concept of machine learning in many algorithms, such as manifold learning, semi-supervised learning, spectral clustering using graph theory are inseparable from it. Understand the matrix and its nature, it is to understand the basis of these algorithms.

 


 

Finally, attach the single-primer organize their own courses and, inside a small blue book, take a book like I did not go into, because I think it is not suitable for beginners. Here I have listed courses and books are very beginner-friendly, suitable for beginners to read. Some years ago I read a book, and some 19 years before the book was written, very close to the current depth learning applications in the industry, to see personally recommend new books. (I do not know why, book reviews usually higher)

Mathematics Curriculum

  1. MIT Open Course: Linear Algebra 35 sets _ _ full open class YORK
  2. Khan Academy - Linear Algebra entry
  3. Linear Algebra should be so learned (watercress) ,
  4. Advanced calculus - National Taiwan University OpenCourseWare (OpenCourseWare NTU) .
  5. Probability and Mathematical Statistics (watercress)

Algorithms course

  1. Coursera- machine learning -Andrew Ng
  2. BiliBili- cornerstone of machine learning - Lin Xuan Tian
  3. CS231n: Convolutional Neural Networks for Visual Recognition
  4. Learning the Tutorial from Stanford Deep  -Stanford Computer Science Department official tutorial, Andrew Ng write
  5. An Introduction to Statistical Learning with Applications in R  highly recommended to see Simple version
  6. Python depth study  watercress score 9.6, ranking first at depth learning category
  7. Depth hands-on science learning  watercress score 9.3, Li Mu teacher wrote
  8. Getting deep learning  watercress score 9.4, Saito Kang Yi, the god of writing

paper

  • Learning Machines at The  - An Introduction nature of the article, so you generally know what the depth of learning is used to doing.
  • Learning Deep  - (Review Article This article was in Nature, May 2015) three gods Yann LeCun, Yoshua Bengio, and Geoffrey Hinton article, do not explain.
  • Growing Pains in Deep Learning
  • Learning in Neural Networks Deep  -. This Technical Report the Overview of the Provides AN Deep Learning and Related Techniques with Special Focus ON A Developments in recent Results years main point is to learn the depth of the last two years (2012-2014) in progress.

Deep learning code library

  • H2O  - an open source scalable library that supports Java, Python, Scala, and R
  • Deeplearning4j  - the Java library, integrated Hadoop and Spark
  • Caffe  - graduate school Yangqing Jia development, it is now maintained by the Berkeley.
  • Theano  - the most popular Python library

The next article will talk about the simple classification algorithm, teach you how to learn machine learning algorithms, machine learning more, programming, AI knowledge, I also welcome the attention of the public number "Turing's cat" ~



 

Guess you like

Origin www.cnblogs.com/y1ran/p/12170598.html