【代码模版】随机森林调参思路及代码模版

随机森林调参的基本思路

首先确定各个参数大概的选择范围，形成参数字典
使用sklearn的RandomizedSearchCV（类似于寻找犯罪嫌疑人）（随机在于节省时间，不用地毯式遍历）利用参数字典中的参数对模型进行训练
得到随机参数下最佳参数组
以最佳参数组为标准上下取值，形成新的参数选择范围和对应的参数字典
使用sklearn的GridSearchCV（类似于在犯罪嫌疑人中找到真正的罪犯）（在初步确定大概参数范围后再使用地毯式遍历有利于提高效率）利用新的参数字典中的参数对模型进行训练
得到最终的最佳参数

下面给出随机森林调参的代码模版
1.随机生成最佳参数

from sklearn.model_selection import RandomizedSearchCV
import numpy as np

# 首先指定参数范围

# 指定随机森林树的个数范围
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 500, num = 10)]  # 从200开始，到500结束，步长为10，可按照需要修改数值
# 最大特征的选择方式
max_features = ['auto']  # 通常只指定auto即可
# 树的最大深度范围
max_depth = [int(x) for x in np.linspace(10, 20, num = 2)]  # 从10开始，到20结束，步长为2，可按照需要修改数值
max_depth.append(None)
# 节点最小分裂所需样本个数范围
min_samples_split = [2, 5, 10]
# 叶子节点最小样本数范围，任何分裂不能让其子节点样本数少于此值
min_samples_leaf = [1, 2, 4]
# 样本采样方法范围
bootstrap = [True, False]

# 生成参数选择范围字典
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

# 随机选择最合适的参数组合
rfr1 = RandomForestRegressor()  # 分类模型修改一下此处即可

rfr1_random = RandomizedSearchCV(estimator=rfr1, param_distributions=random_grid,
                              n_iter = 50, scoring='neg_mean_absolute_error', 
                              cv = 3, verbose=2, random_state=42, n_jobs= -1)
                              # 其中n_iter为迭代多少次；cv为交叉验证的折数；均可按照需求修改
rfr1_random.fit(x_train, y_train)
# 最佳参数输出
rfr1_random.best_params_
# 得到随机最佳参数后，可直接生成随机最佳参数模型
best_random_model = rfr1_random.best_estimator_

2.在随机最佳参数的基础上进行地毯式仔细搜索

from sklearn.model_selection import GridSearchCV

# 新参数字典，各个参数的取值范围是按照随机最佳参数上下取值形成。
# 下面的数值可按照随机最佳参数来进行修改
param_grid = {
    'bootstrap': [True],
    'max_depth': [8,10,12],
    'max_features': ['auto'],
    'min_samples_leaf': [2,3, 4, 5,6],
    'min_samples_split': [3, 5, 7],
    'n_estimators': [800, 900, 1000, 1200]
}

# 地毯式搜索最佳参数组合
rfr2 = RandomForestRegressor()

rfr2_grid = GridSearchCV(estimator = rfr2, param_grid = param_grid, 
                           scoring = 'neg_mean_absolute_error', cv = 3, 
                           n_jobs = -1, verbose = 2)
                           # 参数scoring暂时就用既定指标，n_jobs与运行内存有关，暂时也不要改值,verbose与输出信息的详细程度有关，暂时也不用修改
rfr2_grid.fit(x_train, y_train)
# 最佳参数输出
rfr2_grid.best_params_
# 得到最终最佳参数后，可直接生成最终最佳参数模型
best_final_model = rfr2_grid.best_estimator_

不停下脚步的乌龟

发布了22 篇原创文章 · 获赞 0 · 访问量 924

私信关注

【代码模版】随机森林调参思路及代码模版

猜你喜欢