梯度下降法原理
梯度下降法(Gradient descent)是一个一阶最优化算法。 根据微积分理论,在一个实值函数中某点的梯度方向上升速度最快。要使用梯度下降法找到一个函数的局部极小值,必须向函数上当前点对应梯度(或者是近似梯度)的反方向的规定步长距离点进行迭代搜索。如果相反地向梯度正方向迭代进行搜索,则会接近函数的局部极大值点;这个过程则被称为梯度上升法。
梯度下降方法基于以下的观察:如果实值函数 F ( x ⃗ ) F(\vec{x} ) F(x)在点 a ⃗ \vec{a} a处可微且有定义,其中 x ⃗ = ( x 1 , x 2 , . . . , x n ) \vec{x}=(x_{1},x_{2},...,x_{n} ) x=(x1,x2,...,xn)为自变量构成的向量,那么函数 F ( x ) F(x) F(x)在点 a ⃗ \vec{a} a沿着梯度相反的方向下降最快。
因而,如果
b ⃗ = a ⃗ − α ▽ F ( a ⃗ ) \vec{b}=\vec{a} -\alpha \bigtriangledown F(\vec{a} ) b=a−α▽F(a)
对于为一个够小数值 α > 0 \alpha>0 α>0时成立,那么 F ( a ⃗ ) > F ( b ⃗ ) F(\vec{a} )>F(\vec{b} ) F(a)>F(b)
这个式子表示的是在 a ⃗ \vec{a} a处沿梯度方向的相反方向行走足够一小段距离 α ▽ F ( a ⃗ ) \alpha \bigtriangledown F(\vec{a} ) α▽F(a),那么该函数值必然减小。
考虑到这一点,我们可以从函数F的局部极小值的初始估计出发,并考虑如下序列使得
x n + 1 ⃗ = x n ⃗ − α n ▽ F ( x n ⃗ ) \vec{x_{n+1} }=\vec{x_{n}} -\alpha_{n} \bigtriangledown F(\vec{x_{n}} ) xn+1=xn−αn▽F(xn)
因此可得到
F ( x 0 ⃗ ) ≥ F ( x 1 ⃗ ) ≥ F ( x 2 ⃗ ) ≥ . . . F(\vec{x_{0}} )\ge F(\vec{x_{1}} )\ge F(\vec{x_{2}} )\ge... F(x0)≥F(x1)≥F(x2)≥...
如果顺利的话序列 x n ⃗ \vec{x_{n}} xn收敛到期望的极小值(注意每次迭代步长可以改变)。
对于梯度上升法只需要改变 α \alpha α符号,迭代公式变成:
x n + 1 ⃗ = x n ⃗ + α n ▽ F ( x n ⃗ ) \vec{x_{n+1} }=\vec{x_{n}} +\alpha_{n} \bigtriangledown F(\vec{x_{n}} ) xn+1=xn+αn▽F(xn)
便可收敛到极大值点。
下面我们用梯度下降法求出线性回归方程。
对于给定点集 ( x i , y i ) (x_{i},y_{i} ) (xi,yi)我们希望找到一条拟合直线(或其他曲线)来描述之。例如采用回归直线 y ^ = w 1 x + w 0 \hat{y}=w_{1} x+w_{0} y^=w1x+w0来描述。
该方法的由来是保证样本点到回归直线的度量和最小。度量可以是坐标差,欧氏距离等等,这些度量的极值点是一样的。但是对于非线性的处理可能无法直接得到度量极值点的解析解,因此需要梯度下降的数值解法来求解
梯度下降法迭代过程:定义损失函数(用方差来代替)
S ( w 1 , w 0 ) = 1 n ∑ i = 1 n ( y i − ( w 1 x i + w 0 ) ) 2 S (w_{1},w_{0}) =\frac{1}{n} \sum_{i=1}^{n} (y_{i} -(w_{1}x_{i} +w_{0}))^{2} S(w1,w0)=n1i=1∑n(yi−(w1xi+w0))2
我们要求 S ( w 1 , w 0 ) S (w_{1},w_{0}) S(w1,w0)的最小值。
∂ S ∂ w 1 = 2 n ∑ i = 1 n ( y i − ( w 1 x i + w 0 ) ) x i \frac{\partial S}{\partial w_{1} } =\frac{2}{n} \sum_{i=1}^{n} (y_{i} -(w_{1}x_{i} +w_{0}))x_{i} ∂w1∂S=n2∑i=1n(yi−(w1xi+w0))xi
∂ S ∂ w 0 = 2 n ∑ i = 1 n ( y i − ( w 1 x i + w 0 ) ) \frac{\partial S}{\partial w_{0} } =\frac{2}{n} \sum_{i=1}^{n} (y_{i} -(w_{1}x_{i} +w_{0})) ∂w0∂S=n2∑i=1n(yi−(w1xi+w0))
令 ∂ S ∂ w 1 = 0 \frac{\partial S}{\partial w_{1} } =0 ∂w1∂S=0和 ∂ S ∂ w 0 = 0 \frac{\partial S}{\partial w_{0} } =0 ∂w0∂S=0得到最小二乘法公式
w 1 = ∑ i = 1 n x i y i − x ˉ y ˉ ∑ i = 1 n x i 2 − n x ˉ 2 , w 0 = y ˉ − w 1 x ˉ w_{1} =\frac{\sum_{i=1}^{n}x_{i}y_{i}-\bar{x}\bar{y} }{\sum_{i=1}^{n}x_{i}^{2}-n\bar{x} ^{2} },w_{0} =\bar{y} -w_{1} \bar{x} w1=∑i=1nxi2−nxˉ2∑i=1nxiyi−xˉyˉ,w0=yˉ−w1xˉ
故有 ▽ S = 2 ( ∑ i = 1 n ( y i − ( w 1 x i + w 0 ) ) x i , ∑ i = 1 n ( y i − ( w 1 x i + w 0 ) ) ) \bigtriangledown S=2(\sum_{i=1}^{n} (y_{i} -(w_{1}x_{i} +w_{0}))x_{i},\sum_{i=1}^{n} (y_{i} -(w_{1}x_{i} +w_{0}))) ▽S=2(∑i=1n(yi−(w1xi+w0))xi,∑i=1n(yi−(w1xi+w0)))
设迭代次数为 k k k,
( w 1 ( k + 1 ) , w 0 ( k + 1 ) ) = ( w 1 ( k ) , w 0 ( k ) ) − α ▽ S (w_{1}^{(k+1)},w_{0}^{(k+1)})=(w_{1}^{(k)},w_{0}^{(k)})-\alpha\bigtriangledown S (w1(k+1),w0(k+1))=(w1(k),w0(k))−α▽S
对应系数可以得到如下结果
w 1 ( k + 1 ) = w 1 ( k ) − α ∑ i = 1 n ( y i − ( w 1 ( k ) x i + w 0 ( k ) ) ) x i w_{1}^{(k+1)} =w_{1}^{(k)}-\alpha \sum_{i=1}^{n} (y_{i} -(w_{1}^{(k)}x_{i} +w_{0}^{(k)}))x_{i} w1(k+1)=w1(k)−α∑i=1n(yi−(w1(k)xi+w0(k)))xi
w 0 ( k + 1 ) = w 0 ( k ) − α ∑ i = 1 n ( y i − ( w 1 ( k ) x i + w 0 ( k ) ) ) w_{0}^{(k+1)} =w_{0}^{(k)}-\alpha \sum_{i=1}^{n} (y_{i} -(w_{1}^{(k)}x_{i} +w_{0}^{(k)})) w0(k+1)=w0(k)−α∑i=1n(yi−(w1(k)xi+w0(k)))
S ( k ) = 1 n ∑ i = 1 n ( y i − ( w 1 ( k ) x i + w 0 ( k ) ) ) 2 S ^{(k)} =\frac{1}{n} \sum_{i=1}^{n} (y_{i} -(w_{1}^{(k)}x_{i} +w_{0}^{(k)}))^{2} S(k)=n1∑i=1n(yi−(w1(k)xi+w0(k)))2
式中 k k k为迭代次数, S S S为度量和,我们用方差来表示, α \alpha α为学习参数。初始的 w 1 w_{1} w1和 w 0 w_{0} w0可以任意选取。在不断迭代的过程中,方差 S S S会逐步收敛到极小值点(在本例中极小值点有且仅有一个)。理论上对学习参数 α \alpha α有取值范围限制,我们需要合理调整 α \alpha α值,防止计算机运行中数据溢出。不同的 α \alpha α会使得收敛速度有一定的区别,有时会使方差 S S S在最小值周围游离不定,因此 α \alpha α的选取十分重要。当 S S S到达极小值时, w 1 w_{1} w1和 w 0 w_{0} w0也收敛,这就是我们需要寻找的直线。判断收敛的方法与柯西极限理论一致,即对任给的 ε > 0 \varepsilon>0 ε>0总存在 k k k使得 ∣ S ( k + 1 ) − S ( k ) ∣ < ε |S ^{(k+1)} -S ^{(k)}|<\varepsilon ∣S(k+1)−S(k)∣<ε成立。
在理论上可以得到
lim k → ∞ S ( k ) = 1 n ∑ i = 1 n ( ( y i − y ˉ ) − ∑ i = 1 n x i y i − x ˉ y ˉ ∑ i = 1 n x i 2 − n x ˉ 2 ( x i − x ˉ ) ) 2 \lim_{k \to \infty} S ^{(k)}=\frac{1}{n} \sum_{i=1}^{n} ((y_{i}-\bar{y}) -\frac{\sum_{i=1}^{n}x_{i}y_{i}-\bar{x}\bar{y} }{\sum_{i=1}^{n}x_{i}^{2}-n\bar{x} ^{2} }(x_{i}-\bar{x}) )^{2} k→∞limS(k)=n1i=1∑n((yi−yˉ)−∑i=1nxi2−nxˉ2∑i=1nxiyi−xˉyˉ(xi−xˉ))2
数据
本例采用的线性回归样本点数据如下(从百度网盘提取):
百度网盘数据分享,提取码:o5s6
代码
这部分代码是用Pycharm写的
注释写得很详细喇!仔细看就能明白的
# 天津大学电气自动化与信息工程学院
# 开发时间:2022/3/26 13:05
import xlrd as xd
import numpy as np
import matplotlib.pyplot as plt
# matplotlib画图中中文显示会有问题,需要这两行设置默认字体可以显示中文
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# ---------------------------------------------------------------------------------
# ---------------------------------------------------------------------------------
# ---------------------------------------------------------------------------------
# # 函数定义
# 求方差函数:∑(yi-hw(x))^2/n
def VarSolve (x, # 自变量列表
y, # 因变量列表
w1, # 上一次迭代的斜率
w0): # 上一次迭代的截距
Variance = 0 # 定义初始方差为0
for i in range(len(x)):
Variance = Variance + (y[i] - (w1 * x[i] + w0))**2 / len(x)
return Variance
# 求w1增量函数:∑(yi-hw(x))xi/n
def Del_w1 (x, # 自变量列表
y, # 因变量列表
w1, # 上一次迭代的斜率
w0): # 上一次迭代的截距
SumDel = 0
for i in range(len(x)):
SumDel = SumDel + (y[i] - (w1 * x[i] + w0)) * x[i] / len(x)
return SumDel
# 求w0增量函数:∑(yi-hw(x))/n
def Del_w0 (x, # 自变量列表
y, # 因变量列表
w1, # 上一次迭代的斜率
w0): # 上一次迭代的截距
SumDel = 0
for i in range(len(x)):
SumDel = SumDel + (y[i] - (w1 * x[i] + w0)) / len(x)
return SumDel
# 最小二乘法线性回归函数LSM(Least Square Method),
# 返回回归方程的斜率和截距
def LSM (x, y): # x, y分别为自变量列表和因变量列表
Avr_x, Avr_y, Avr_xy, Avr_x2 = 0, 0, 0, 0
# 容错,若输入自变量与因变量长度不相同,直接退出计算
if len(x) != len(y) | len(x) <= 0 | len(y) <= 0:
return
else:
for i in range(len(x)):
Avr_x = Avr_x + x[i] / len(x)
Avr_y = Avr_y + y[i] / len(y)
Avr_xy = Avr_xy + x[i] * y[i] / len(x)
Avr_x2 = Avr_x2 + x[i] ** 2 / len(x)
w1Thry = (Avr_xy - Avr_x * Avr_y) / (Avr_x2 - Avr_x ** 2)
w0Thry = Avr_y - w1Thry * Avr_x
return w1Thry, w0Thry
# 梯度下降法GDM(Gradient Descent Method),
# 返回回归方程的斜率和截距(及其列表)、收敛后的方差列表
def GDM (x, # 自变量列表
y, # 因变量列表
Alpha = 0.01, # 学习参数,默认0.01
err = 1e-8): # 误差,默认1e-8
# 容错,若输入自变量与因变量长度不相同,直接退出计算
if len(x) != len(y) | len(x) <= 0 | len(y) <= 0:
return
else:
w1, w0 = Del_w1(x, y, 0, 0), Del_w0(x, y, 0, 0)
# 方差、w1, w0 列表,每次迭代后的方差放置在列表Variance中
Variance = []
W1list = []
W0list = []
# 列表中先添加初始条件的第一个元素
Variance.append(VarSolve(x, y, w1, w0))
W1list.append(w1)
W0list.append(w0)
IterTime = 0 # 迭代次数
# 梯度下降法的实现原理
while 1:
# 每次迭代,w1增加α∑(yi-hw(x))xi/n, w0增加α∑(yi-hw(x))/n
w1, w0 = w1 + Alpha * Del_w1(x, y, w1, w0), w0 + Alpha * Del_w0(x, y, w1, w0)
Variance.append(VarSolve(x, y, w1, w0))
W1list.append(w1)
W0list.append(w0)
IterTime = IterTime + 1
# 用方差Variance是否收敛到误差范围内来判断是否结束循环
if abs(Variance[IterTime] - Variance[IterTime - 1]) < err:
break
return w1, w0, Variance, W1list, W0list
# ---------------------------------------------------------------------------------
# ---------------------------------------------------------------------------------
# ---------------------------------------------------------------------------------
# # 读取数据
# 打开excel表所在路径
data = xd.open_workbook('Sta2.xls')
# 读取数据,以excel表名来打开
sheet = data.sheet_by_name('Sheet1')
x_with_y = []
x = []
y = []
# 将表中数据按行逐步添加到列表中,最后转换为list结构
for r in range(sheet.nrows):
data1 = []
for c in range(sheet.ncols):
data1.append(sheet.cell_value(r, c))
x_with_y.append(list(data1))
x.append(data1[0])
y.append(data1[1])
# print(x)
# print(y)
# 将列表中的数据存储到两个list x和y 中
# ---------------------------------------------------------------------------------
# ---------------------------------------------------------------------------------
# ---------------------------------------------------------------------------------
# # 数据处理
# 计算收敛方差和回归直线
Alpha = 0.01 # 学习参数
err = 1e-8 # 误差限
# w1,w0使用梯度下降法求出的直线的斜率和截距
w1, w0, Variance, W1list, W0list = GDM(x, y, Alpha, err)
# 最后收敛后的方差
VarInter = Variance[len(Variance) - 1]
# 绘制收敛方差直线的列表
VarIList = []
for i in range(len(Variance)):
VarIList.append(VarInter)
# 绘制收敛方差w1的列表
W1IList = []
for i in range(len(W1list)):
W1IList.append(w1)
# 绘制收敛方差w0的列表
W0IList = []
for i in range(len(W0list)):
W0IList.append(w0)
# 将列表x, y, VarIList 等转化为矩阵向量,以便matplotlib绘图
X = np.c_[x]
Y = np.c_[y]
V = np.c_[VarIList]
W1 = np.c_[W1list]
W0 = np.c_[W0list]
Y1 = w1 * X + w0
# w1, w0 的理论值(最小二乘法计算)
# w1Thry, w0Thry = 1.19303, -3.89578
w1Thry, w0Thry = LSM(x, y)
Y1Thry = w1Thry * X + w0Thry
# -----------------------------------------------------------------------------------
# -----------------------------------------------------------------------------------
# -----------------------------------------------------------------------------------
# # 绘图
plt.figure('梯度下降法拟合直线')
# 绘制回归方程
strl1 = 'y=' + str(round(w1, 3)) + 'x' + str(round(w0, 3))
strl2 = 'y=' + str(round(w1Thry, 3)) + 'x' + str(round(w0Thry, 3))
plt.subplot(1, 2, 1)
plt.title('数据散点与拟合直线')
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(X, Y, c='r', marker='+')
plt.plot(X, Y1, 'k', label='梯度下降法拟合直线'+strl1)
plt.plot(X, Y1Thry, c='g', label='最小二乘法拟合的理论直线'+strl2)
plt.legend()
# 绘制误差函数收敛过程函数
str0 = '方差收敛到' + str(round(VarInter, 5)) + ',误差不超过'+str(err)
plt.subplot(2, 2, 2)
plt.title('方差收敛过程,学习参数α='+str(Alpha))
plt.xlabel('迭代次数')
plt.ylabel('方差')
plt.loglog(Variance, color='k')
plt.plot(V, '--', label=str0, c='k')
plt.legend()
# 绘制w1, w0收敛过程函数
strw1 = '斜率收敛到'+str(round(w1, 3))+',与理论值的误差为'+str(round(100 * abs((w1-w1Thry) / w1Thry), 3))+'%'
strw0 = '斜率收敛到'+str(round(w0, 3))+',与理论值的误差为'+str(round(100 * abs((w0-w0Thry) / w0Thry), 3))+'%'
plt.subplot(2, 2, 4)
plt.title('w1和w0收敛过程')
plt.semilogx(W1, color='r', label='斜率w1变化曲线')
plt.semilogx(W0, color='b', label='截距w0变化曲线')
plt.plot(W1IList, '--', label=strw1, c='r')
plt.plot(W0IList, '--', label=strw0, c='b')
plt.ylim((-10, 20))
plt.xlabel('迭代次数')
plt.ylabel('w1 或 w0')
plt.legend()
plt.show()
结果如下图:
二次回归模拟
如果采用二次函数 y ^ = w 2 x 2 + w 1 x + w 0 \hat{y}=w_{2} x^{2}+w_{1}x+w_{0} y^=w2x2+w1x+w0对该样本数据进行回归模拟,定义损失函数
S ( w 2 , w 1 , w 0 ) = 1 n ∑ i = 1 n ( y i − ( w 2 x i 2 + w 1 x i + w 0 ) ) 2 S (w_{2},w_{1},w_{0}) =\frac{1}{n} \sum_{i=1}^{n} (y_{i} -(w_{2} x^{2}_{i}+w_{1}x_{i}+w_{0}))^{2} S(w2,w1,w0)=n1i=1∑n(yi−(w2xi2+w1xi+w0))2
∂ S ∂ w 2 = 2 n ∑ i = 1 n ( y i − ( w 2 x i 2 + w 1 x i + w 0 ) x i 2 \frac{\partial S}{\partial w_{2} } =\frac{2}{n} \sum_{i=1}^{n} (y_{i} -(w_{2} x^{2}_{i}+w_{1}x_{i}+w_{0})x_{i}^{2} ∂w2∂S=n2∑i=1n(yi−(w2xi2+w1xi+w0)xi2
∂ S ∂ w 1 = 2 n ∑ i = 1 n ( y i − ( w 2 x i 2 + w 1 x i + w 0 ) ) x i \frac{\partial S}{\partial w_{1} } =\frac{2}{n} \sum_{i=1}^{n} (y_{i} -(w_{2} x^{2}_{i}+w_{1}x_{i}+w_{0}))x_{i} ∂w1∂S=n2∑i=1n(yi−(w2xi2+w1xi+w0))xi
∂ S ∂ w 0 = 2 n ∑ i = 1 n ( y i − ( w 2 x i 2 + w 1 x i + w 0 ) ) \frac{\partial S}{\partial w_{0} } =\frac{2}{n} \sum_{i=1}^{n} (y_{i} -(w_{2}x^{2}_{i}+w_{1}x_{i}+w_{0})) ∂w0∂S=n2∑i=1n(yi−(w2xi2+w1xi+w0))
令 ∂ S ∂ w 2 = 0 \frac{\partial S}{\partial w_{2} }=0 ∂w2∂S=0, ∂ S ∂ w 1 = 0 \frac{\partial S}{\partial w_{1} }=0 ∂w1∂S=0, ∂ S ∂ w 0 = 0 \frac{\partial S}{\partial w_{0} }=0 ∂w0∂S=0解得
w 2 = ∣ ∑ i = 1 n x i 2 y i ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i y i ∑ i = 1 n x i 2 ∑ i = 1 n x i ∑ i = 1 n y i ∑ i = 1 n x i 1 ∣ ∣ ∑ i = 1 n x i 4 ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i ∑ i = 1 n x i 2 ∑ i = 1 n x i 1 ∣ w_{2} =\frac{\begin{vmatrix} \sum_{i=1}^{n}x_{i}^{2}y_{i} & \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2}\\ \sum_{i=1}^{n}x_{i}y_{i} & \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i}\\ \sum_{i=1}^{n}y_{i} & \sum_{i=1}^{n}x_{i} & 1 \end{vmatrix}}{\begin{vmatrix} \sum_{i=1}^{n}x_{i}^{4} & \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2}\\ \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i}\\ \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i} & 1 \end{vmatrix}} w2=∣∣∣∣∣∣∑i=1nxi4∑i=1nxi3∑i=1nxi2∑i=1nxi3∑i=1nxi2∑i=1nxi∑i=1nxi2∑i=1nxi1∣∣∣∣∣∣∣∣∣∣∣∣∑i=1nxi2yi∑i=1nxiyi∑i=1nyi∑i=1nxi3∑i=1nxi2∑i=1nxi∑i=1nxi2∑i=1nxi1∣∣∣∣∣∣
w 1 = ∣ ∑ i = 1 n x i 4 ∑ i = 1 n x i 2 y i ∑ i = 1 n x i 2 ∑ i = 1 n x i 3 ∑ i = 1 n x i y i ∑ i = 1 n x i ∑ i = 1 n x i 2 ∑ i = 1 n y i 1 ∣ ∣ ∑ i = 1 n x i 4 ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i ∑ i = 1 n x i 2 ∑ i = 1 n x i 1 ∣ w_{1} =\frac{ \begin{vmatrix} \sum_{i=1}^{n}x_{i}^{4} & \sum_{i=1}^{n}x_{i}^{2}y_{i} & \sum_{i=1}^{n}x_{i}^{2}\\ \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}y_{i} & \sum_{i=1}^{n}x_{i}\\ \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}y_{i} & 1 \end{vmatrix} }{\begin{vmatrix} \sum_{i=1}^{n}x_{i}^{4} & \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2}\\ \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i}\\ \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i} & 1 \end{vmatrix}} w1=∣∣∣∣∣∣∑i=1nxi4∑i=1nxi3∑i=1nxi2∑i=1nxi3∑i=1nxi2∑i=1nxi∑i=1nxi2∑i=1nxi1∣∣∣∣∣∣∣∣∣∣∣∣∑i=1nxi4∑i=1nxi3∑i=1nxi2∑i=1nxi2yi∑i=1nxiyi∑i=1nyi∑i=1nxi2∑i=1nxi1∣∣∣∣∣∣
w 0 = ∣ ∑ i = 1 n x i 4 ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 y i ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i y i ∑ i = 1 n x i 2 ∑ i = 1 n x i ∑ i = 1 n y i ∣ ∣ ∑ i = 1 n x i 4 ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i 3 ∑ i = 1 n x i 2 ∑ i = 1 n x i ∑ i = 1 n x i 2 ∑ i = 1 n x i 1 ∣ w_{0} =\frac{ \begin{vmatrix} \sum_{i=1}^{n}x_{i}^{4} & \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2}y_{i}\\ \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i}y_{i}\\ \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i} & \sum_{i=1}^{n}y_{i} \end{vmatrix} }{\begin{vmatrix} \sum_{i=1}^{n}x_{i}^{4} & \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2}\\ \sum_{i=1}^{n}x_{i}^{3} & \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i}\\ \sum_{i=1}^{n}x_{i}^{2} & \sum_{i=1}^{n}x_{i} & 1 \end{vmatrix}} w0=∣∣∣∣∣∣∑i=1nxi4∑i=1nxi3∑i=1nxi2∑i=1nxi3∑i=1nxi2∑i=1nxi∑i=1nxi2∑i=1nxi1∣∣∣∣∣∣∣∣∣∣∣∣∑i=1nxi4∑i=1nxi3∑i=1nxi2∑i=1nxi3∑i=1nxi2∑i=1nxi∑i=1nxi2yi∑i=1nxiyi∑i=1nyi∣∣∣∣∣∣
迭代公式变成
w 2 ( k + 1 ) = w 2 ( k ) + α ∑ i = 1 n ( y i − ( w 2 ( k ) x i 2 + w 1 ( k ) x i + w 0 ( k ) ) ) x i 2 w_{2}^{(k+1)} =w_{2}^{(k)}+\alpha \sum_{i=1}^{n} (y_{i} -(w_{2} ^{(k)}x^{2}_{i}+w_{1}^{(k)}x_{i} +w_{0}^{(k)}))x_{i}^{2} w2(k+1)=w2(k)+α∑i=1n(yi−(w2(k)xi2+w1(k)xi+w0(k)))xi2
w 1 ( k + 1 ) = w 1 ( k ) + α ∑ i = 1 n ( y i − ( w 2 ( k ) x i 2 + w 1 ( k ) x i + w 0 ( k ) ) ) x i w_{1}^{(k+1)} =w_{1}^{(k)}+\alpha \sum_{i=1}^{n} (y_{i} -(w_{2} ^{(k)}x^{2}_{i}+w_{1}^{(k)}x_{i} +w_{0}^{(k)}))x_{i} w1(k+1)=w1(k)+α∑i=1n(yi−(w2(k)xi2+w1(k)xi+w0(k)))xi
w 0 ( k + 1 ) = w 0 ( k ) + α ∑ i = 1 n ( y i − ( w 2 ( k ) x i 2 + w 1 ( k ) x i + w 0 ( k ) ) ) w_{0}^{(k+1)} =w_{0}^{(k)}+\alpha \sum_{i=1}^{n} (y_{i} -(w_{2}^{(k)} x^{2}_{i}+w_{1}^{(k)}x_{i} +w_{0}^{(k)})) w0(k+1)=w0(k)+α∑i=1n(yi−(w2(k)xi2+w1(k)xi+w0(k)))
S ( k ) = 1 n ∑ i = 1 n ( y i − ( w 2 ( k ) x i 2 + w 1 ( k ) x i + w 0 ( k ) ) ) 2 S ^{(k)} =\frac{1}{n} \sum_{i=1}^{n} (y_{i} -(w_{2}^{(k)}x_{i}^{2}+w_{1}^{(k)}x_{i} +w_{0}^{(k)}))^{2} S(k)=n1∑i=1n(yi−(w2(k)xi2+w1(k)xi+w0(k)))2
用以上样本点进行二次函数的非线性回归得到
线性回归方程 y = 1.193 x − 3.896 y=1.193x-3.896 y=1.193x−3.896,相关系数 R 2 = 0.702 R^{2}=0.702 R2=0.702
二次回归方程 y = 3.791 × 1 0 − 3 x 2 + 1.101 x − 3.452 y=3.791\times10^{-3}x^{2}+1.101x-3.452 y=3.791×10−3x2+1.101x−3.452,相关系数 R 2 = 0.702 R^{2}=0.702 R2=0.702
其二次项系数非常小,故线性特征较强