基于Hinge Loss的Linear SVM梯度下降算法数学推导

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/sunlanchang/article/details/88952015

传统的SVM使用凸二次规划的方式进行优化,使得损失函数收敛,参考李宏毅教授的机器学习课程的SVM的梯度下降的优化算法推导非常的简单明了,这里记录一下,并且参考Siraj Raval的例子使用梯度下降进行深入理解。
在这里插入图片描述

实例

生成训练SVM的数据

#To help us perform math operations
import numpy as np
#to plot our data and model visually
from matplotlib import pyplot as plt
%matplotlib inline

#Step 1 - Define our data

#Input data - Of the form [X value, Y value, Bias term]
X = np.array([
    [-2,4,-1],
    [4,1,-1],
    [1, 6, -1],
    [2, 4, -1],
    [6, 2, -1],
])

#Associated output labels - First 2 examples are labeled '-1' and last 3 are labeled '+1'
y = np.array([-1,-1,1,1,1])

#lets plot these examples on a 2D graph!
#for each example
for d, sample in enumerate(X):
    # Plot the negative samples (the first 2)
    if d < 2:
        plt.scatter(sample[0], sample[1], s=120, marker='_', linewidths=2)
    # Plot the positive samples (the last 3)
    else:
        plt.scatter(sample[0], sample[1], s=120, marker='+', linewidths=2)

# Print a possible hyperplane, that is seperating the two classes.
#we'll two points and draw the line between them (naive guess)
plt.plot([-2,6],[6,0.5])

生成的数据如下图所示,中间的超平面是SVM需要根据训练数据找到的:
在这里插入图片描述

定义SVM模型超参数

注意这里的损失函数Loss加上了参数的L2正则化,更新参数时候相应减去 w 2 ||w||^2 的梯度。

#lets perform stochastic gradient descent to learn the seperating hyperplane between both classes

def svm_sgd_plot(X, Y):
    #Initialize our SVMs weight vector with zeros (3 values)
    w = np.zeros(len(X[0]))
    #The learning rate
    eta = 1
    #how many iterations to train for
    epochs = 100000
    #store misclassifications so we can plot how they change over time
    errors = []

    #training part, gradient descent part
    for epoch in range(1,epochs):
        error = 0
        for i, x in enumerate(X):
        # 算法核心,使用梯度下降更新参数,
        # 其中的条件判断和梯度更新公式是上图数学推导的最后两步。
            #当分类错误的时候
            if (Y[i]*np.dot(X[i], w)) < 1:
                #分类错误时,梯度下降更新w参数(这里参数更新加上了w的L2正则化防止过拟合)
                # 并且随着epoch的增加正则化效果越来越弱,1/epoch的作用
                w = w - eta * ( -(X[i] * Y[i]) - (2 *(1/epoch)* w) )
                error = 1
            else:
                # 正确分类时,更新w参数
                w = w - eta * (2 *(1/epoch)* w)
        errors.append(error)
        

    #lets plot the rate of classification errors during training for our SVM
    plt.plot(errors, '|')
    plt.ylim(0.5,1.5)
    plt.axes().set_yticklabels([])
    plt.xlabel('Epoch')
    plt.ylabel('Misclassified')
    plt.show()
    
    return w

训练模型

w = svm_sgd_plot(X,y)
#they decrease over time! Our SVM is learning the optimal hyperplane

预测数据

for d, sample in enumerate(X):
    # Plot the negative samples
    if d < 2:
        plt.scatter(sample[0], sample[1], s=120, marker='_', linewidths=2)
    # Plot the positive samples
    else:
        plt.scatter(sample[0], sample[1], s=120, marker='+', linewidths=2)

# Add our test samples
plt.scatter(2,2, s=120, marker='_', linewidths=2, color='yellow')
plt.scatter(4,3, s=120, marker='+', linewidths=2, color='blue')

# Print the hyperplane calculated by svm_sgd()
x2=[w[0],w[1],-w[1],w[0]]
x3=[w[0],w[1],w[1],-w[0]]

x2x3 =np.array([x2,x3])
X,Y,U,V = zip(*x2x3)
ax = plt.gca()
ax.quiver(X,Y,U,V,scale=1, color='blue')

SVM在训练10万个epoch后所找到的超平面:
在这里插入图片描述

参考:https://youtu.be/QSEPStBgwRQ
Siraj:https://youtu.be/g8D5YL6cOSE
Code:https://github.com/llSourcell/Classifying_Data_Using_a_Support_Vector_Machine

猜你喜欢

转载自blog.csdn.net/sunlanchang/article/details/88952015