1.4.2 监督学习算法

监督学习算法

监督学习算法是给定一组输入x和输出y的训练集，学习如何关联输入和输出。

概率监督学习（逻辑回归）

线性回归中，能够通过求正规方差来找到最佳权重，相比而言，逻辑回归更困难，其最佳权重没有闭解。反之，我们必须最大化对数似然来搜索最优解，通过梯度下降法最小化负对数似然来搜索。

# _*_ coding: utf-8 _*_
"""
根据两门考试成绩，判断是否通过
steps:
1\ deal with the data
2\ get the hypothesis function
3\ get the cost function
4\ get the gradient of the cost function
5\ calculate the theta with decrece gradient
6\ predict with the theta, caculate the accuracy
7\ plot the boundary line if possiable
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def dealData():
    data = pd.read_csv('ex2data1.txt', names=['exam1', 'exam2', 'admit'])
    print(data.head())
    if 'Ones' not in data.columns:
        data.insert(0, 'Ones', 1)

    X = data.iloc[:, :-1].as_matrix()
    y = data.iloc[:, -1].as_matrix()

    theta = np.zeros(X.shape[1])
    return X, y, theta, data

def sigmod(z):
    """
    logistic regression hypothesis
    :param z:
    :return:
    """
    return 1 / (1 + np.exp(-z))


def cost(theta, X, y):
    first = (-y * np.log(sigmod(X @ theta)))
    second = (1 - y) * np.log(1 - sigmod(X @ theta))
    return np.mean(first - second)

def gradient(theta, X, y):
    """
    calculate the gradient
    :param theta:
    :param X:
    :param y:
    :return:
    """
    return (X.T @ (sigmod(X @ theta) - y)) / len(X)


# Learning the theta
import scipy.optimize as opt
X, y, theta, data = dealData()
result = opt.fmin_tnc(func=cost, x0=theta, fprime=gradient, args=(X, y))
print(result)

# evaluating logistic regression
def predict(theta, X):
    probability = sigmod(X @ theta)
    return [1 if x >= 0.5 else 0 for x in probability ]

final_theta = result[0]
predictions = predict(final_theta, X)
correct = [1 if a==b else 0 for (a, b) in zip(predictions, y)]
accuracy = sum(correct) / len(correct)
print(accuracy)

from sklearn.metrics import classification_report

print(classification_report(predictions, y))

# Decision boundary, plot the line
# theta0 + x1 * thate1 + x2 * theta2 = 0
x1 = np.arange(130, step=0.1)
x2 = -(final_theta[0] + x1*final_theta[1]) / final_theta[2]

# visaulize the original data
positive = data[data.admit.isin(['1'])]
negetive = data[data.admit.isin(['0'])]
plt.scatter(positive['exam1'], positive['exam2'], c='b', label='admitted')
plt.scatter(negetive['exam1'], negetive['exam2'], c='r', label='Not admitted')
plt.plot(x1, x2)
plt.legend()
plt.xlabel('exam1')
plt.ylabel('exam2')
plt.show()

       exam1      exam2  admit
0  34.623660  78.024693      0
1  30.286711  43.894998      0
2  35.847409  72.902198      0
3  60.182599  86.308552      1
4  79.032736  75.344376      1
(array([-25.16131867,   0.20623159,   0.20147149]), 36, 0)
0.89
             precision    recall  f1-score   support

          0       0.85      0.87      0.86        39
          1       0.92      0.90      0.91        61

avg / total       0.89      0.89      0.89       100

在这里插入图片描述

支持向量机

从逻辑回归到SVM：

$h_{\theta}(x) = \frac{1}{1 + e^{-\theta^Tx}}$

在这里插入图片描述
可以看到，如果想要 $y = 1, h_\theta (x) = 1， \theta^Tx >> 0$

对于单个样本逻辑回归的cost function是：

$cost = -(y\log h_\theta(x))-(1-y)\log(1-h_\theta(x)) = -y\log \frac{1}{1+e^{-\theta^Tx}} - (1-y)\log (1- \frac {1}{1+ e^-\theta^Tx})$

这个式子代表这单个样本对整体cost的贡献

考虑两种情况：

在这里插入图片描述
两种情况下分别为 $cost_1(z)$ 和 $cost_0(z)$

优化目标：

logistic : $min \frac{1}{m} \sum[(y^{(i)}(-\log h_\theta(x^{(i)})))-(1-y^{(i)})\log(1-h_\theta(x^{(i)}))] + \frac{\lambda}{2m}\sum\theta_j^2$

简化为 $A+\lambda B$

替换后：

SVM : $min \frac{1}{m} \sum [(y^{(i)}cost_1(\theta^Tx^{(i)}) -(1-y^{(i)})cost_0(\theta^Tx^{(i)})] + \frac{\lambda}{2m}\sum\theta_j^2$

去掉m

SVM : $min \sum[(y^{(i)}cost_1(\theta^Tx^{(i)}) -(1-y^{(i)})cost_0(\theta^Tx^{(i)})] + \frac{\lambda}{2}\sum\theta_j^2$

简化为CA+B的形式

最终： $min C \sum [(y^{(i)}cost_1(\theta^Tx^{(i)}) -(1-y^{(i)})cost_0(\theta^Tx^{(i)})] + \frac{1}{2}\sum\theta_j^2$

核函数

1、高斯hernel（最常用）

在这里插入图片描述

2、多项式kernel

kernel=‘poly’

3、线性kernel

kernel=‘linear’

通过SVM训练的步骤

利用核函数转换特征

我们有数据 $(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ..., (x^{(m)}, y^{(m)})$

令 $l^{(1)}=x^{(1)}， l^{(2)}=x^{(2)}, ...,l^{(m)}=x^{(m)}$

对于一个样本 $(x^{(i)}, y^{(i)})$ 使用高斯核，则

特征 $x^{(i)}$ 变成:

$f^{(i)}_1 = kernel(x^{(i)}, l^{(1)})$

$f^{(i)}_2 = kernel(x^{(i)}, l^{(2)})$

…

$f^{(i)}_m = kernel(x^{(i)}, l^{(m)})$

令 $f^{(i)} = [f^{(i)}_1, f^{(i)}_2, ..., f^{(i)}_m]^T$

那么训练数据变成 $(f^{(i)}, y^{(i)})$

训练

$min C \sum [(y^{(i)}cost_1(\theta^Tf^{(i)}) -(1-y^{(i)})cost_0(\theta^Tf^{(i)})] + \frac{1}{2}\sum\theta_j^2$

找到最小值，就找到了支持向量机的参数

# _*_ coding: utf-8 _*_
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

def make_meshgrid(x, y, h=.02):
    x_min, x_max = x.min() - 1, x.max() + 1
    y_min, y_max = y.min() - 1, y.max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    return xx, yy


def plot_contours(ax, clf, xx, yy, **params):
    Z = clf.predict(np.c_[xx.ravel(),  yy.ravel()])
    Z = Z.reshape(xx.shape)
    out = ax.contourf(xx, yy, Z, **params)
    return out

iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target

C = 1.0 # SVC regularization parameter
models = (svm.SVC(kernel='linear', C=C),
          svm.LinearSVC(C=C),
          svm.SVC(kernel='rbf', gamma=0.7, C=C),  ## 径向基函数 (Radial Basis Function 简称 RBF), 就是某种沿径向对称的标量函数
          svm.SVC(kernel='poly', degree=3, C=C))
models = (clf.fit(X, y) for clf in models)

# title for the plots
titles = ('SVC with linear kernel',
          'LinearSVC (linear kernel)',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel')

fig, sub = plt.subplots(2, 2)
X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)

for clf, title, ax in zip(models, titles, sub.flatten()):
    plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)
plt.show()

Automatically created module for IPython interactive environment

在这里插入图片描述