梯度下降法课堂笔记

什么是梯度下降法？
这是一个在机器学习领域来求一个目标函数的最小值的搜索方法。
这里写图片描述

对于有些函数，我们找到的可能是局部最优解，应该怎么办呢？
这里写图片描述

下面我们用程序来模拟梯度下降的过程：

import numpy as np
import matplotlib.pyplot as plt
plot_x = np.linspace(-1,6,141)
plot_y = (plot_x-2.5)**2 -1
plt.plot(plot_x, plot_y)
plt.show()

图像如下：
这里写图片描述
梯度下降法：

def dJ(theta):
    return 2(theta-2.5)
def J(theta):
    return (theta-2.5)**2-1
eta = 0.1
theta = 0.0
theta_history = [theta]
epsilon = 1e-8
while True:
    gradient = dJ(theta)
    last_theta = theta
    theta = theta - eta*gradient
    theta_history.append(theta)
    if (abs(J(theta)-J(last_theta))<epsilon):
        break
plt.plot(plot_x,J(plot_x))
plt.plot(np.array(theta_history),J(np.array(theta_history)),color='r',markers='+')
plt.show()

梯度下降的图示如下：
这里写图片描述
如果我们把学习率增大：

线性回归中的梯度下降：
这里写图片描述

下面用代码演示一下用梯度下降法来拟合线性模型：

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(66)
x = 2*np.random.random(size=100)
y = x*3.+4.+np.random.normal(size=100)
plt.scatter(x,y)
plt.show()

如图：
这里写图片描述

损失函数：

def J(theta, X_b, y):
    try:
        return np.sum((y-X_b.dot(theta))**2)/len(X_b)
    except:
        return folat('inf')

导数：

def dJ(theta,X_b,y):
    res = np.empty(len(theta))
    res[0] = np.sum(X_b.dot(theta)-y)
    for i in range(i,len(theta)):
        res[i] = np.sum((X_b.dot(theta)-y).dot(X_b[:,i]))
    return res*2/len(X_b)

梯度下降：

def gradient_descent(X_b,y,initial_theta,eta,n_iters=1e4,epsilon=1e-8):
    theta=initial_theta
    i_iter=0
    while i_iter<n_iters:
        gradient=dJ(theta,X_b,y)
        last_theta = theta
        theta=theta-eta*gradient
        if(abs(J(theta,X_b,y)-J(last_theta,X_b,y))<epsilon):
            break
        i_iter+=1
    return theta
X = x.reshape(-1,1)
X_b = np.hstack([np.ones(len(X),1),X])
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01
theta = gradient_descent(X_b,y,initial_theta,eta)

用向量化的方法梯度下降法：
导函数的向量表示：
这里写图片描述

def dJ(theta,X_b,y):
    return X_b.T.dot(X_b.dot(theta)-y) *2. / len(X_b)

scikit_learn中的SGD

from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(n_iter=100)
%time sgd_reg.fit(x_train_standard,y_train)
sgd_reg.score(x_test_standard,y_test)

%time 是为了显示时间， x_train_standard， x_test_standard是标准化之后的数据

在未知导函数的时候，可以在要求梯度点的附近取两个点，当足够近的时候，两个点之间的斜率近似于梯度值，帮可以用这种方法来调试梯度。
这里写图片描述

实际上，我们会用随机梯度下降法，小批量梯度下降法：
这里写图片描述

梯度下降法课堂笔记

猜你喜欢