基本概念

决策树（Decision Tree）是一种用来分类和回归的无参监督学习方法。其目的是创建一种模型从数据特征中学习简单的决策规则来预测一个目标变量的值。顾名思义，决策树是基于树结构进行决策的。决策过程中提出的每个判定问题都是对某个属性的“测试”。如下图所示。
在这里插入图片描述
其基本流程遵循简单而直观的“分而治之”策略。

从上图可以看出，决策树学习的关键是第8行，即如何选择最优划分属性。不同的划分产生不同的决策树算法，常见的包括：ID3、C4.5、CART等、

ID3算法

ID3决策树算法[Quinlan,1986]是以信息增益为准则来选择划分属性。

C4.5算法

C4.5决策树算法[Quinlan,1993]不直接使用信息增益，而是使用“增益率”来选择最优划分属性。

CART算法

CART决策树使用“基尼指数”（Gini index）来选择划分属性。

剪枝处理

剪枝（pruning）是决策树学习算法对付“过拟合”的主要手段。在决策树学习中，为了尽可能正确分类训练样本，节点划分过程将不断重复，有时会造成决策树分支过多，这时就可能因为训练样本学得“太好了”，以至于把训练集自身的一些特点当作所有数据都具有的一般性质从而导致过拟合。因此，可以通过主动去掉一些分支降低过拟合的风险。决策树剪枝的基本策略有“预剪枝”和“后剪枝”。预剪枝是指在决策树生成过程中，对每个节点在划分前进行估计，若当前节点的划分不能带来决策树泛化能力提升，则停止划分并将当前节点标记为叶节点。后剪枝则是先从训练集生成一棵完整的决策树，然后自底向上地对非叶节点进行考察，若将该节点对应的子树替换为叶节点能带来决策树泛化性能提升，则将该子树替换为叶节点。一般情况下，后剪枝决策树的欠拟合风险很小，泛化性能往往优于预剪枝决策树。

示例演示

我们看看sklearn的官方例子：Plot the decision surface of a decision tree on the iris dataset。

"""
================================================================
Plot the decision surface of a decision tree on the iris dataset
================================================================

Plot the decision surface of a decision tree trained on pairs
of features of the iris dataset.

See :ref:`decision tree <tree>` for more information on the estimator.

For each pair of iris features, the decision tree learns decision
boundaries made of combinations of simple thresholding rules inferred from
the training samples.

We also show the tree structure of a model built on all of the features.
"""
print(__doc__)

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree

# Parameters
n_classes = 3
plot_colors = "ryb"
plot_step = 0.02

# Load data
iris = load_iris()

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
                                [1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = iris.data[:, pair]
    y = iris.target

    # Train
    clf = DecisionTreeClassifier().fit(X, y)

    # Plot the decision boundary
    plt.subplot(2, 3, pairidx + 1)

    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                         np.arange(y_min, y_max, plot_step))
    plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

    plt.xlabel(iris.feature_names[pair[0]])
    plt.ylabel(iris.feature_names[pair[1]])

    # Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.where(y == i)
        plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i],
                    cmap=plt.cm.RdYlBu, edgecolor='black', s=15)

plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend(loc='lower right', borderpad=0, handletextpad=0)
plt.axis("tight")

plt.figure()
clf = DecisionTreeClassifier().fit(iris.data, iris.target)
plot_tree(clf, filled=True)
plt.show()

运行结果

在这里插入图片描述

参考资料

机器学习[M]

1.决策树