xgboost parameters - reprint

XGBoost parameters

Before running XGboost, you must set three types of maturity: general parameters, booster parameters and task parameters:

General parameters: the lift control parameters (Boosting) used during which booster, the booster has a common tree model (Tree) and a linear model (linear model).

Booster parameters: It depends what kind of booster use.

Task parameters: Control scene study, for example, use different parameters to control sort regression problems.

In addition to the above parameters may also be other parameters on the command line

Parameters in R Package

In R-package, you can use .(dot) to replace under score in the parameters, for example, you can use max.depth as max_depth. The underscore parameters are also valid in R.

General Parameters

booster [default=gbtree] 

There are two models can be selected in gbtree and gblinear. gbtree tree based model lifting calculation, gblinear lifting calculation using the linear model. The default value is gbtree

silent [default=0] 

Represents runtime information printed out, when run as a silent mode takes 1, the runtime does not print information is set to 0. The default value is 0

nthread [default to maximum number of threads available if not set] 

The number of threads XGBoost runtime. The default value is the maximum number of threads available in the current system

num_pbuffer [set automatically by xgboost, no need to be set by user] 

size of prediction buffer, normally set to number of training instances. The buffers are used to save the prediction results of last boosting step.

num_feature [set automatically by xgboost, no need to be set by user] 

used in the process of boosting feature dimensions, the number of feature set. XGBoost automatically sets without manual settings

Booster Parameters

From xgboost-unity, the bst: prefix is no longer needed for booster parameters. Parameter with or without bst: prefix will be equivalent(i.e. both bst:eta and eta will be valid parameter setting) .

Parameter for Tree Booster

eta [default=0.3] 

In order to prevent over-fitting, used in the process of updating the contraction step. Right after each lifting calculation algorithm will directly get the new features of weight. eta cut right through the features of the heavy lifting calculation process more conservative. The default value is 0.3

In the range of: [0,1]

gamma [default=0] 

minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.

range: [0,∞]

max_depth [default=6] 

The maximum number of depth. The default value is 6

In the range: [1, ∞]

min_child_weight [default=1] 

Child node and the minimum sample weights. If the sample weight and the weight a leaf node is less than the resolution min_child_weight process ends. In the current regression model, this parameter is to establish a minimum number of samples required for each model. The greater the algorithm more conservative mature

In the range: [0, ∞]

max_delta_step [default=0] 

Maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update

In the range: [0, ∞]

subsample [default=1] 

Subsample used to train the model accounts for the proportion of the entire sample collection. If set to 0.5 means that the entire punch XGBoost random sample set extracted in a random sample of 50% of the established sub tree model, which can prevent over-fitting.

In the range: (0, 1]

colsample_bytree [default=1] 

Characterized in the establishment of the tree sampling ratio. The default is 1

Range: (0,1]

Parameter for Linear Booster

lambda [default=0] 

L2 regular penalty coefficient

alpha [default=0] 

L1 canonical penalty coefficient

lambda_bias 

L2 on a regular offset. The default value is 0 (no regularization term bias on L1, because the bias is not important when L1)

Task Parameters

objective [ default=reg:linear ] 

Define appropriate learning tasks and learning objectives, optional objective function is as follows:

"Reg: linear" - linear regression.

"Reg: logistic" - logistic regression.

"Binary: logistic" - binary logistic regression problems, the output probability.

"Binary: logitraw" - binary logistic regression, and the result is output wTx.

"Count: poisson" - counting problem poisson regression output is poisson distribution.

In poisson regression, max_delta_step default value of 0.7. (Used to safeguard optimization)

"Multi: softmax" - let XGBoost softmax using an objective function to handle multiple classification problems, and needs to set parameters num_class (category number)

"Multi: softprob" - and softmax same, but the output is ndata * nclass vector, the vector matrix can reshape ndata rows nclass column. No line of data represents the probability of each category sample belongs.

“rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss

base_score [ default=0.5 ] 

the initial prediction score of all instances, global bias

eval_metric [ default according to objective ] 

Evaluation check the required data, different objective functions will have a default evaluation (rmse for regression, and error for classification, mean average precision for ranking)

The user can add a variety of evaluation index, for the Python list of parameters to be passed to the program, instead of the parameter list parameter map does not cover 'eval_metric'

The choices are listed below:

“rmse”: root mean square error

“logloss”: negative log-likelihood

“error”: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.

“merror”: Multiclass classification error rate. It is calculated as #(wrong cases)/#(all cases).

“mlogloss”: Multiclass logloss

“auc”: Area under the curve for ranking evaluation.

“ndcg”:Normalized Discounted Cumulative Gain

“map”:Mean average precision

“ndcg@n”,”map@n”: n can be assigned as an integer to cut off the top positions in the lists for evaluation.

“ndcg-“,”map-“,”ndcg@n-“,”map@n-“: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding “-” in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions. 

training repeatively

seed [ default=0 ] 

Random number seed. The default value is 0

Console Parameters

The following parameters are only used in the console version of xgboost 

* use_buffer [ default=1 ] 

- whether to create a binary cache files as input, cache files can speed up calculations. The default is 1 

* num_round 

- boosting the number of iterative calculation. 

* data 

- input data path 

* test:data 

- The path of test data 

* save_period [default=0] 

- Save the model represents the first iteration of the i * save_period. For example, every 10 save_period = 10 represents the iterative calculation XGBoost intermediate results will be set to 0 for each calculation model must be maintained. 

* task [default=train] options: train, pred, eval, dump 

- train: train obvious 

- pred: test data to predict 

- eval: by eval [name] = Evaluation defined filenam 

- dump: save the learning model into text format 

* model_in [default=NULL] 

- pointing to the model path test, eval, dump will be used, if will then enter the model defined in XGBoost training to continue training 

* model_out [default=NULL] 

- Keep the path After training model, the output will be similar if not defined 0003.model such a result, the model 0003 is the result of the third training. 

* model_dir [default=models] 

- output model stored path. 

* fmap 

- feature map, used for dump model 

* name_dump [default=dump.txt] 

- name of model dump file 

* name_pred [default=pred.txt] 

- predict the outcome document 

* pred_margin [default=0] 

- Almost outputting the predicted boundary, instead of converting

Reproduced in: https: //www.jianshu.com/p/419418187e5d

Guess you like

Origin blog.csdn.net/weixin_34197488/article/details/91314801