一、k近邻方法
1. 使用两层循环计算距离矩阵
训练数据X_train
和测试数据X
中每一行是一个样本点。距离矩阵dists
中每一行为X
中的一点与X_train
中各个点的距离。
k_nearest_neighbor
文件中的compute_distances_two_loops()
函数:
def compute_distances_two_loops(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in tqdm(xrange(num_test)):
for j in xrange(num_train):
# TODO
dists[i, j] = np.sqrt(np.sum((X[i] - self.X_train[j])**2))
return dists
2. 实现分类函数
使用numpy.argsort()
获取k近邻,使用numpy.bincount()
实现k近邻的投票统计。
k_nearest_neighbor
文件中的predict_labels()
函数:
def predict_labels(self, dists, k=1):
num_test = dists.shape[0]
y_pred = np.zeros(num_test)
indxRangeList = range(k)
for i in xrange(num_test):
closest_y = []
# TODO
closest_y = self.y_train[np.argsort(dists[i])[:k]]
# TODO
y_pred[i] = np.argmax(np.bincount(closest_y))
return y_pred
3. 使用一层循环计算距离矩阵
每次取出X
中的一个点,计算其与X_train
中各个点的距离。
这时就可看出使用矩阵中的一行代表一个样本点的好处:按行进行索引后,取出的向量与另一个矩阵自动满足numpy
的广播条件。
k_nearest_neighbor
文件中的compute_distances_one_loop()
函数:
def compute_distances_one_loop(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
for i in tqdm(xrange(num_test)):
# TODO
dists[i, :] = np.sqrt(np.sum((X[i] - self.X_train)**2, axis = 1))
return dists
4. 使用向量化方法计算距离矩阵(不使用循环)
假设点用行向量表示,先考虑计算两个点 和 的距离:
若设两个点集为:
则 和 分别可用 和 的按行取模的值替代; 可用 替代。然后在计算距离矩阵时,与 相关的沿列方向进行广播,与 相关的沿行方向进行广播。
k_nearest_neighbor
文件中的compute_distances_no_loops()
函数:
def compute_distances_no_loops(self, X):
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
# TODO
XY = np.dot(X,self.X_train.T)
X_norm2 = np.sum(np.square(X),axis=1,keepdims=True)
Y_norm2 = np.sum(np.square(self.X_train),axis=1,keepdims=True)
dists = np.sqrt(X_norm2 - 2*XY + Y_norm2.T)
return dists
其中keepdims
参数表示:对于numpy
中归并类的函数(即如np.sum()
这类由矩阵中的某些值得到一个值的函数),保持所得结果的维数与原矩阵的维数相同。
5. k-folds
选取超参数
- 训练数据集与验证数据集划分。
使用np.array_split()
函数,默认沿行划分。
X_train_folds = np.array_split(X_train, num_folds)
y_train_folds = np.array_split(y_train, num_folds)
- 针对超参数 进行参数筛选。
对每一个
值,使用num_folds-1
折的数据作为训练集,使用剩余的一折数据作为验证集,得到模型的num_folds
个准确率。
for k in k_choices:
if k in k_to_accuracies:
continue
else:
k_to_accuracies[k] = []
for foldIndx in range(num_folds):
X_train_cv = np.vstack(X_train_folds[0:foldIndx] + X_train_folds[foldIndx+1:])
y_train_cv = np.hstack(y_train_folds[0:foldIndx] + y_train_folds[foldIndx+1:])
classifier.train(X_train_cv, y_train_cv)
dists = classifier.compute_distances_no_loops(X_train_folds[foldIndx])
y_pred = classifier.predict_labels(dists, k)
# Compute and print the fraction of correctly predicted examples
num_correct = np.sum(y_pred == y_train_folds[foldIndx])
accuracy = float(num_correct) / y_train_folds[foldIndx].shape[0]
k_to_accuracies[k].append(accuracy)
所得准确率统计如下图所示。
可见在k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
范围内,当k=10
时,所得准确率最高。此时k近邻模型在测试集上的准确率约为28.2%
。
二、支持向量机方法
对单一样本 ,假设模型给出的在各个类别上的得分为 (行向量),则损失函数为: ,梯度为:
1. 计算损失函数与梯度的朴素方法(以循环方式)
linear_svm
文件中svm_loss_naive()
函数:
def svm_loss_naive(W, X, y, reg):
dW = np.zeros(W.shape) # initialize the gradient as zero
# TODO
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in xrange(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in xrange(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
dW[:, j] += X[i,:].T
dW[:, y[i]] -= X[i,:].T
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train
# Add regularization to the loss.
loss += reg * np.sum(W * W)
dW += reg * 2 * W
return loss, dW
2. 使用向量化方法计算损失与梯度
事实上,实现朴素的计算方法,对整理出向量化实现方法是很有帮助的。
linear_svm
文件中svm_loss_vectorized()
函数:
def svm_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
# TODO
num_train = X.shape[0]
scores = X.dot(W)
correct_scores = scores[np.arange(num_train), y][:, newaxis]
scores = scores - correct_scores + 1
scores[range(num_train), y] = 0
scores = np.maximum(0, scores)
loss = np.sum(scores) / num_train
loss += reg * np.sum(W * W)
# TODO
scores[scores > 0] = 1
scores[range(num_train), y] -= np.sum(scores, axis=1)
dW = dW + X.T.dot(scores)
dW /= num_train
dW += reg * 2 * W
return loss, dW
3. 随机梯度算法
linear_classifier
文件中LinearClassifier.train()
函数:
def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
batch_size=200, verbose=False):
num_train, dim = X.shape
num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
if self.W is None:
# lazily initialize W
self.W = 0.001 * np.random.randn(dim, num_classes)
# Run stochastic gradient descent to optimize W
loss_history = []
for it in xrange(num_iters):
X_batch = None
y_batch = None
# TODO
# Sample batch_size elements
idx = np.random.choice(num_train, batch_size)
X_batch = X[idx,:]
y_batch = y[idx]
# evaluate loss and gradient
loss, grad = self.loss(X_batch, y_batch, reg)
loss_history.append(loss)
# perform parameter update
# TODO
self.W -= learning_rate * grad
if verbose and it % 100 == 0:
print('iteration %d / %d: loss %f' % (it, num_iters, loss))
return loss_history
4. 预测函数
linear_classfier
文件中LinearClassifier.predict()
函数:使用numpy.argmax()
函数获取得分最高的位置索引。
def predict(self, X):
y_pred = np.argmax(X.dot(self.W), axis=1)
return y_pred
5. 超参数选取
for lr in learning_rates:
for rs in regularization_strengths:
svm = LinearSVM()
loss_hist = svm.train(X_train, y_train, learning_rate=lr, reg=rs,
num_iters=1000, verbose=False)
y_val_pred = svm.predict(X_val)
y_train_pred = svm.predict(X_train)
train_acc = np.mean(y_train == y_train_pred)
val_acc = np.mean(y_val == y_val_pred)
results[(lr, rs)] = (train_acc, val_acc)
if val_acc > best_val:
best_val = val_acc
best_svm = svm
三、Softmax
分类器
分类器输出为被判判断是各类别的概率:
对单一样本
,假设模型给出的在各个类别上的概率为
(行向量),损失函数为:
,则梯度为:
1. 计算损失函数与梯度的朴素方法(以循环方式)
softmax
文件中softmax_loss_naive()
函数:
def softmax_loss_naive(W, X, y, reg):
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
# TODO
num_train = X.shape[0]
num_class = W.shape[1]
for i in xrange(num_train):
probs = np.exp(X[i,:].dot(W))
probs = probs / np.sum(probs)
loss -= np.log(probs[y[i]])
for j in xrange(num_class):
if j == y[i]:
dW[:,j] += (probs[j] - 1)*X[i,:].T
else:
dW[:,j] += probs[j]*X[i,:].T
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train
# Add regularization to the loss.
loss += reg * np.sum(W * W)
dW += reg * 2 * W
return loss, dW
2. 使用向量化方法计算损失与梯度
softmax
文件中的softmax_loss_vectorized()
函数:
def softmax_loss_vectorized(W, X, y, reg):
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
num_class = W.shape[1]
# TODO
probs = np.exp(X.dot(W))
probs = probs / np.sum(probs, axis = 1)[:, newaxis]
loss = loss - np.sum(np.log(probs[range(num_train), y]))
loss /= num_train
loss += reg * np.sum(W * W)
probs[range(num_train), y] = probs[range(num_train), y] - 1
dW = dW + X.T.dot(probs)
dW /= num_train
dW += reg * 2 * W
return loss, dW
3. 超参数选取
for lr in learning_rates:
for rs in regularization_strengths:
softmaxClassfier = Softmax()
loss_hist = softmaxClassfier.train(X_train, y_train, learning_rate=lr, reg=rs,
num_iters=1000, verbose=False)
y_val_pred = softmaxClassfier.predict(X_val)
y_train_pred = softmaxClassfier.predict(X_train)
train_acc = np.mean(y_train == y_train_pred)
val_acc = np.mean(y_val == y_val_pred)
results[(lr, rs)] = (train_acc, val_acc)
if val_acc > best_val:
best_val = val_acc
best_softmax = softmaxClassfier
四、两层神经网络
1. 神经网络的前向传播与反向传播
这里采用和Softmax
分类器同样的损失函数。则前向传播的过程很直白。而由BP算法的推导,反向传播的过程也很直白。
neural_net
文件的loss()
函数:
def loss(self, X, y=None, reg=0.0):
# Unpack variables from the params dictionary
W1, b1 = self.params['W1'], self.params['b1']
W2, b2 = self.params['W2'], self.params['b2']
N, D = X.shape
# Compute the forward pass
scores = None
# TODO: Perform the forward pass, computing the class scores for the input.
h1 = np.maximum(0, X.dot(W1) + b1)
scores = h1.dot(W2) + b2
# If the targets are not given then jump out, we're done
if y is None:
return scores
# Compute the loss
loss = None
# TODO: Finish the forward pass, and compute the loss.
probs = np.exp(scores)
probs = probs / np.sum(probs, axis = 1, keepdims=True)
loss = 0 - np.sum(np.log(probs[range(N), y]+eps))
loss /= N
loss += reg * (np.sum(W1*W1) + np.sum(W2*W2))
# Backward pass: compute gradients
grads = {}
# TODO: backward propagation
probs[range(N), y] = probs[range(N), y] - 1
gradW2 = h1.T.dot(probs)
gradW2 /= N
gradW2 += reg * 2 * W2
gradB2 = np.sum(probs, axis=0)
gradB2 /= N
gradH1 = probs.dot(W2.T)
gradW1 = X.T.dot((h1>0)*gradH1)
gradW1 /= N
gradW1 += reg * 2 * W1
gradB1 = np.sum((h1>0)*gradH1, axis=0)
gradB1 /= N
grads["W1"] = gradW1
grads["W2"] = gradW2
grads["b1"] = gradB1
grads["b2"] = gradB2
return loss, grads
2. 训练、预测、超参数选取
与其余分类器类似。
五、特征工程
采用两层神经网络,使用如下参数可在测试数据集上获得60%的准确率:
learning_rate = 1.5
regularization_strength = 0.001
num_iters = 5000
batch_size = 500