Machine Learning (5) - Integrated Learning

1. The difference between boosting and bagging: 

  (1) bagging: randomly sample S data sets of the same size from the original data to train S basic learners, and the learners are independent of each other. is a parallel method. 
  The weights of each classifier are equal. The classification result is to use the S classifiers for classification, and the category with the most votes from the classifiers is selected as the final classification result. 
    (The sampling method is sampling with replacement: allowing duplicate values ​​in each small data set.)

  advantage: 

  a. Training a Bagging ensemble has the same order of complexity as directly using the basic learning algorithm to train a learner, which is efficient;

  b. The standard AdaBoost is only suitable for binary classification, and Bagging can be directly used for multi-classification, regression and other tasks;

  c. Because of self-sampling, each base learner only uses about 63.2% of the samples in the training set of the accident, and the remaining samples can be used as the validation set and so on.

  (2) Boosting: Use all data to train the base learners. There is a dependency between individual learners. Each learner is based on the results of the previously trained learners. Serial training focuses on the misclassified data. , to obtain a new learner and achieve an improved effect. (In layman's terms, it is to learn only one point at a time, and then step by step to approach the final value to be predicted.) 
  The result of the classification is based on the weighted sum of all classifiers. The weights of the classifiers are not equal. The weight represents the success of its corresponding classifier in the previous iteration.

  Advantages: low generalization error, easy implementation, high classification accuracy, and few adjustable parameters;

  Disadvantages: Sensitive to outliers.

  The two are similar: the types of classifiers used are the same.

2. Why do you say bagging is to reduce variance, and boosting is to reduce bias?

  (1) Bagging resamples the samples, trains a model for each subsample set obtained by resampling, and finally takes the average. Due to the similarity of the subsample sets and the use of the same model, the models have approximately equal bias and variance (in fact, the distributions of the models are also approximately the same, but not independent). The sub-models obtained by the bagging method have a certain correlation and belong to the intermediate state of the above two extreme conditions, so the variance can be reduced to a certain extent. (lower variance, more focused predictions)

  (2) Therefore, boosting minimizes the loss function in a sequential (tandem) manner, and its bias naturally decreases gradually. However, due to this sequential and adaptive strategy, the sub-models are strongly correlated, so the sum of the sub-models cannot significantly reduce the variance. Therefore, boosting mainly depends on reducing the bias to improve the prediction accuracy. (Reduce bias, more accurate prediction)

  (3) Intuitive explanation 
  boosting is to combine many weak classifiers into a strong classifier. Weak classifiers have high bias, while strong classifiers have low bias, so boosting plays a role in reducing bias. Variance is not a primary consideration for boosting. 
  Bagging is averaging over many strong (even too strong) classifiers. Here, the bias of each individual classifier is low, and the bias is still low after averaging; and each individual classifier is strong enough to cause overfitting, that is, the variance is high, and the average operation is played. The effect is to reduce this variance.

3. The possible benefits of combining learners   

  (1) Improve the generalization ability (2) Reduce the risk of local optimum (3) The hypothesis space is expanded, and the similarity is better.

4. Methods/Strategies for Model Fusion 

  (1) Average method: For numerical regression prediction problems, the commonly used combination strategy is the average method, that is, the output of several weak learners is averaged to obtain the final prediction output.

  (2) Voting method: The simplest voting method is the relative majority voting method, that is, the minority obeys the majority.

  (3) Learning method: stacking (the output of this layer is part of the input data of the next layer)

  When using the combination strategy of stacking, we do not simply logically process the results of the weak learner, but add a layer of learner, that is, we take the learning result of the weak learner in the training set as input, and use The output of the training set is used as output, and a learner is retrained to get the final result.

5. Principles of common fusion frameworks; advantages and disadvantages; will fusion definitely improve performance? Why might fusion improve prediction performance? 

  Principle: multiple are better than one + guarantee accuracy, prevent overfitting + weak learners are obvious + good and different 
  Common: bagging (parallel + less variance), boosting (serial + less deviation), stacking (output –> input ) is 
    not necessarily good, but different 
    models have differences and reflect different expressive abilities

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325300428&siteId=291194637