吴恩达第三次作业——多元分类

【此类专题仅用于学习】

本次主要通过多个逻辑回归分类来识别 $0 \sim 9$ 的数字。

数据预处理

已有的数据是 $20*20$ 的像素点，一共有 $5000$ 组样例。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat

数据集是MATLAB的本机格式，所以要加载它在Python，我们需要使用一个SciPy工具。

data = loadmat('ex3data1.mat')

这里导入了数据，数据是分为 $X、Y$ ，分别是 $(5000,400)$ 和 $(5000,1)$
我们通过下面的语句可以得到他们的大小。

data['X'].shape,data['Y'].shape

代价函数

我们通过逻辑回归求解.
我们首先写一个 $g$ 函数， $g$ 代表一个常用的逻辑函数为 $S$ 形函数，公式为： $g\left( z \right)=\frac{1}{1+{{e}^{-z}}}$
合起来，我们得到逻辑回归模型的假设函数：
${{h}_{\theta }}\left( x \right)=\frac{1}{1+{{e}^{-{{\theta }^{T}}X}}}$
$g$ 函数：

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

带正则化的代价函数：
$J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{{y}^{(i)}}\log \left( {{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1-{{h}_{\theta }}\left( {{x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}}$
我们可以分成左右两边，先转换成 $numpy$ 的矩阵。
因为是对应项相乘，所以是 $np.multiply$ ，这里 $X$ 是 $5000*400$ ， $theta$ 是 $1*400$ ，所以计算预测值是 $X*theta.T$ 。
这里因为是 $numpy$ 矩阵，可以直接用 $*$ 表示点乘。
这里 $np.power$ 表示对全体平方， $np.power([],k)$ 表示 $k$ 次方。
因为正则化不包含常数项，所以进行了列表分片。

def cost(theta, X, y, learningRate):
    # INPUT：参数值theta，数据X，标签y，学习率
    # OUTPUT：当前参数值下的交叉熵损失
    # TODO：根据参数和输入的数据计算交叉熵损失函数
    
    # STEP1：将theta, X, y转换为numpy类型的矩阵
    # your code here  (appro ~ 3 lines)
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    # STEP2：根据公式计算损失函数（不含正则化）
    # your code here  (appro ~ 2 lines)
    m=len(X)
    cross_cost = np.multiply(-y,np.log(sigmoid(X*theta.T)))
    cross_cost = cross_cost-np.multiply(1-y,np.log(1-sigmoid(X*theta.T)))
    # STEP3：根据公式计算损失函数中的正则化部分
    # your code here  (appro ~ 1 lines)
    reg = (learningRate/(2*m))*np.sum(np.power(theta[1:],2))
    # STEP4：把上两步当中的结果加起来得到整体损失函数
    # your code here  (appro ~ 1 lines)
    whole_cost = np.sum(cross_cost)/m + reg
    return whole_cost

计算梯度

(以便使用TNC截断牛顿法进行优化梯度下降）

如果我们要使用梯度下降法令这个代价函数最小化，因为我们未对 ${{\theta }_{0}}$ 进行正则化，所以梯度下降算法将分两种情形：
$\theta_0=\theta_0-a\frac{1}{m}\sum\limits_{i=1}^m[h_{\theta}(x^{(i)})-y^{(i)}]x_0^{(i)}$
$\theta_j=\theta_j-a(\frac{1}{m}\sum\limits_{i=1}^m[h_{\theta}(x^{(i)})-y^{(i)}]x_j^{(i)}+\frac{\lambda}{m}\theta_j)$
这里我们 $a$ 就当做 $1$ 。

首先还是转换成矩阵。
然后得到参数数量，一般采取的方式都是把 $theta$ 拉平
$num=int(theta.ravel().shape[0])$
直接计算 $error$ ，即 $(h_{\theta}(x^{(i)})-y^{(i)})$

我们可以发现 $[h_{\theta}(x^{(i)})-y^{(i)}]*x_j^{(i)}$ ,因为是要第 $j$ 列所有和 $error$ 对应乘，并且要得到 $num$ 个结果。
可以想到用 $X.T*error$ ，左边第一行变成了 $x_1^{i}$ ，乘上所有的 $error$ 。
最后因为第 $0$ 项，没有正则化所以要修正。直接用 $grad[0,0]$

def gradient(theta, X, y, learningRate):
    # INPUT：参数值theta，数据X，标签y，学习率
    # OUTPUT：当前参数值下的梯度
    # TODO：根据参数和输入的数据计算梯度
    
    # STEP1：将theta, X, y转换为numpy类型的矩阵
    # your code here  (appro ~ 3 lines)e
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    # STEP2：将theta矩阵拉直（转换为一个向量）
    # your code here  (appro ~ 1 lines)
    parameters = int(theta.ravel().shape[0])
    
    # STEP3：计算预测的误p差
    # your code here  (appro ~ 1 lines)    
    error = sigmoid(X*theta.T) - y
    
    # STEP4：根据上面的公式计算梯度
    # your code here  (appro ~ 1 lines)
    grad = ((X.T*error).T + learningRate*theta)/len(X)
    
    # STEP5：由于j=0时不需要正则化，所以这里重置一下
    # your code here  (appro ~ 1 lines)
    grad[0, 0] = np.sum(np.multiply(error,X[:,0]))/len(X)
    
    return np.array(grad).ravel()

分类器

因为需要分 $10$ 个类，我们采取的办法就是分别做 $10$ 次逻辑回归。
每个分类器在“类别 $i$ ”和“不是 $i$ ”之间决定。我们将把分类器训练包含在一个函数中，该函数计算 $10$ 个分类器中的每个分类器的最终权重，并将权重返回为 $10*(n + 1)$ 数组，其中 $n$ 是参数数量。因为是分类之间互不影响，我们直接利用 $minimize$ 进行优化梯度下降求最佳参数。

这里我们 $all\_theta$ 是记录最佳参数的，然后利用下面的转换得到对于每一次的训练集的结果。

y_i = np.array([1 if label == i else 0 for label in y])
y_i = np.reshape(y_i, (rows, 1))

然后进行梯度下降
$fmin = minimize(fun=cost, x0=theta, args=(tmpX, y_i, learning_rate), method='TNC', jac=gradient)$
$fun$ 表示代价函数， $x0$ 是初值， $args$ 是(训练集参数，训练集结果，正则化参数)， $method='TNC'$ 表示截断牛顿法， $jac$ 表示梯度。
$fmin.x$ 即可得到参数结果。

最后总的分类器函数是

from scipy.optimize import minimize

def one_vs_all(X, y, num_labels, learning_rate):
    rows = X.shape[0]
    params = X.shape[1]
    
    # k X (n + 1) array for the parameters of each of the k classifiers
    all_theta = np.zeros((num_labels, params + 1))
    
    # insert a column of ones at the beginning for the intercept term
    tmpX = np.insert(X, 0, values=np.ones(rows), axis=1)
    
    # labels are 1-indexed instead of 0-indexed
    for i in range(1, num_labels + 1):
        theta = np.zeros(params + 1)
        y_i = np.array([1 if label == i else 0 for label in y])
        y_i = np.reshape(y_i, (rows, 1))
        
        # minimize the objective function
        fmin = minimize(fun=cost, x0=theta, args=(tmpX, y_i, learning_rate), method='TNC', jac=gradient)
        all_theta[i-1,:] = fmin.x
    
    return all_theta

这边插入一行元素，使用了 $tmpX=np.insert(X,0,values=np.ones(rows),axis=1)$
表示在列的范畴，第 $0$ 列，加上全 $1$ 。

检验结果

这个可以输出多少种标签 $[$ 从而我们可以推广至更多元分类。

np.unique(data['y'])#看下有几类标签

最后进行预测测试集。

def predict_all(X, all_theta):
    # INPUT：参数值theta，测试数据X
    # OUTPUT：预测值
    # TODO：对测试数据进行预测
    
    # STEP1：获取矩阵的维度信息
    rows = X.shape[0]
    params = X.shape[1]
    num_labels = all_theta.shape[0]
    
    # STEP2：把矩阵X加入一行1元素
    # your code here  (appro ~ 1 lines)
    tmpX = np.insert(X,0,values=np.ones(rows),axis=1)
    
    # STEP3：把矩阵X和all_theta转换为numpy型矩阵
    # your code here  (appro ~ 2 lines)
    tmpX = np.matrix(tmpX)
    all_theta = np.matrix(all_theta)
    
    # STEP4：计算样本属于每一类的概率
    # your code here  (appro ~ 1 lines)
    h = sigmoid(tmpX*all_theta.T)
    
    # STEP5：找到每个样本中预测概率最大的值
    # your code here  (appro ~ 1 lines)
    h_argmax = np.argmax(h, axis=1)
    
    # STEP6：因为我们的数组是零索引的，所以我们需要为真正的标签+1
    h_argmax = h_argmax + 1
    
    return h_argmax

这里的 $np.argmax(h,axis=1)$ 是对每一行取所有列的最大值。
如果是要对每一列所有行取最值，则是 $axis=0$

有效性

直接带回去比较。

y_pred = predict_all(data['X'], all_theta)
correct = [1 if a == b else 0 for (a, b) in zip(y_pred, data['y'])]
accuracy = (sum(map(int, correct)) / float(len(correct)))
print ('accuracy = {0}%'.format(accuracy * 100))

这里 $correct$ 类似于之前， $zip$ 函数表示分别迭代。
这里的 $map$ 函数用法是 $(l,r)$ 对 $r$ 所有元素使用 $l$ 函数，返回列表。
这里我们统计所有正确数量，除以总数量。
这里的 $\{0\}$ 表示 $py$ 的格式化输出，里面的编号表示了顺序。
得到准确率应为94.46%。

mxYlulu

发布了203 篇原创文章 · 获赞 17 · 访问量 2万+

私信关注