Boosting algorithm evolutionary history

Background :

The current popular algorithms, neural network in addition to outside the image and text, audio and other fields shine, integrated learning xgboost, lightGBM, CatBoost also kaggle have become the hottest tools on machine learning platform.

 

A clear concept:

1, Boosting (lift)

2, Adaptive Boosting (Adaptive Boosting)

3, Gradient Boosting (gradient lift)

4, Decision Tree (tree)

 

A, Boosting

About the concept (lifting scheme) is boosting, have been briefly described above, to enhance the algorithm (which is a class of algorithms called family, not referring to some specific algorithm) by changing the training sample weights (if it is not supported by the base classifier changed by adjusting the sampling manner of training sample distribution based classifiers), a plurality of learning classifiers and classifiers linear combination of these, to improve the classification performance. Here are all the talk about integration algorithm to enhance the algorithm

 

Two, Adaboost (adaptive enhancement algorithm Adaptive Boosting)

 For classification tasks, Li Hang "statistical learning methods" are described in detail

 

 

 

To summarize the process shown below

 

 

 

 

 

After the above study, we can already identify core is adaboost

1) error calculation based classifiers

2) calculate the weights of the weight of the base classifiers

3) updating the sample weights

4) Finally, the combination strategy

For return mission, adaboost the following steps

 

 

 

 

 

 

Three, GBDT ( GBDT refers to all gradient boosting tree algorithms, including XGBoost, it is also a variant GBDT, in order to distinguish them, especially GBDT general: Algorithms "Greedy Function Approximation A Gradient Boosting Machine " in the proposed only a a first derivative information.)

Reference from: https://zhuanlan.zhihu.com/p/46445201

GBDT is a CART tree (Regression Trees) as a boosting algorithm based classifier. Adaboost by weight and each adjustment value of different training samples of different base classifiers, GBDT as residuals by fitting the target base classifiers trained experience.

GBDT look at the principles applied to return to the task

 

 

 

In fact, I have always been GBDT the "gradient upgrade" did not understand, because the optimization theory, the gradient descent optimization algorithm already very familiar with, get to compare the two, I always think GBDT is still a "gradient descent" instead "gradient upgrade", the following chart is a comparison on a blog

 

 This clearly is the use of "gradient descent."

After taken a lot of information, I finally understand why the use of GBDT "gradient upgrade" is called up. Here the "gradient upgrade" is not and the concept of "gradient descent" opposition, on the contrary, it should be to dismantle

 "Gradient lift" = "gradient" + "lift"

"Gradient" refers to a gradient algorithm GBDT approximate fit residuals practice

"Lift" refers in the boosting learning method is promoted to a strong weak learner practice

And then attached a GBDT example of practical application of regression tasks

 

 

 

GBDT used in classification tasks

 For dichotomous task,

 

 

 

 

 

 

 

 Above, the most important thing is fitting for each classification task-based classifier is true tag and the corresponding probability and class prediction residuals.

 

 

四、XGBoost(eXtreme Gradient Boosting)

Can be seen from the name, xgboost is an enhanced version of GBDT.

If xgboost all details of the improvements listed gbdt to that point involves a bit more, so I chose a few points to elaborate.

In order to most easily understood by thinking, we assume no idea xgboost algorithm, the process go thinking GBDT in which points can be improved:

1, groups selected learner.

GBDT using CART (Regression Trees) as base learners, we can also consider supporting other base learners, so xgboost also supports linear learner

2, select loss function.

GBDT most cases the use of square error lower and as a loss function, we can also consider better loss function, so xgboost realize their loss function.

3, the split point feature and feature selection.

And wherein GBDT feature split point selection method using the CART tree, specifically a serial traversal of all the feature points and feature split, and selecting the minimum squared error characteristic features and fragmentation points;

This process, we note that calculation of the loss function and the division point each feature may be executed in parallel, but if the sample can be sorted according to the results of characteristics with the global multiplexing, can greatly improve the computational efficiency, but also to do so xgboost.

In addition, the characteristics of each tree GBDT selection policy is the same, the smaller variance, lack of diversity, we can draw a random sample of the forest of columns (random variable selection) idea, xgboost also realize this.

4, different trees for residuals fitting strategy

GBDT advantage of the use of residuals instead of the first derivative of the residual fitting (It should be explained, instead of a lot of information that the cause of the residuals using the first derivative of the residuals is difficult to obtain, which is good nonsense ah, obviously fitting the first derivative of in order to fit faster, and when squared error and loss of function, the first derivative is equal to the residual), diverging what we think of gradient descent and Newton's law, then we can not use the second derivative of the proposed fit residuals, the answer is yes, and xgboost is to do so, and calculated the loss of function xgboost (see step 2) by fitting the second derivative strategies. Loss of function not only takes into account the experience of risk, also taking into account the structural risk, structural risk through regularization, making better xgboost generalization performance, are less likely to over-fitting.

 

Five, LightGBM

The algorithm in terms of accuracy, efficiency and memory consumption, such as performance on most data sets are better than xgboost.

We follow the above ideas, continue to think about optimizing basis xgboost on how to further optimize the algorithm GB class.

1, boosting the process is a continuous feature selection and feature split point selected issues most time-consuming, xgboost has been optimized by means of pre-ordered pre-sorted, but if too many samples corresponding feature enumeration value, or will the cause problems takes too long. So we can consider HistoGram (histogram) algorithm, by advance feature samples were divided barrels (bin) way, in the choice of the split point when traversing each bucket, can effectively improve operational efficiency, although it will lose a little bit accuracy but it can be offset by other optimization.

2, node splitting strategy. GBDT xgboost the splitting process and tree, are used level-wise (similar traversal sequence) split, this approach treated equally split contribution may vary greatly in different sub-nodes in the same layer. lightGBM using leaf-wise (similar depth-first traversal) split strategy, each step of selecting the largest contribution to the deepest sub-node to split.

3, the sampling method. Whether GBDT or xgboost, we are constantly training base learners to fit residuals, stop training when the residual is less than a certain threshold, there may be a case for the majority of the sample in terms of its gradient has been small, and the gradient small portion of the sample is still large, so we can expect every time a new training base learner, reserved gradient larger sample, reducing the gradient smaller number of samples (random sampling), which would It is GOSS method (Gradient-based One-Side Sampling).

 

ok ,,, written that they are not satisfied with the first edition, iterative optimization behind it. . . I would like to mend catboost. . .

Guess you like

Origin www.cnblogs.com/tianyadream/p/12470503.html