深度学习初探/01-回归问题/2-Linear Regression实战
1、使用梯度下降算法实现Linear Regression的方法
Linear Regression的目的是根据样本点集Points,拟合出一条最理想的线性函数【 y = w x + b y=wx+b y=wx+b】。
则对于任意点 P ( x i , y i ) P(x_i,y_i) P(xi,yi),需要使 ( w x i + b − y i ) (wx_i+b-y_i) (wxi+b−yi)尽可能小;
对于点集Points的所有点,需要使误差之和 t o t a l E r r o r = Σ ( w x i + b − y i ) = Σ ( w x i + b − y i ) m i n totalError=\Sigma(wx_i+b-y_i)=\Sigma(wx_i+b-y_i)_{min} totalError=Σ(wxi+b−yi)=Σ(wxi+b−yi)min
⇒ \Rightarrow ⇒ 平均误差 A v g = ( t o t a l E r r o r / 点 数 ) m i n Avg= (totalError/点数)_{min} Avg=(totalError/点数)min
为了便于求最值(即便于求导),我们设损失函数 L o s s F u n c t i o n Loss Function LossFunction:
L o s s = ( y ∗ − y ) 2 = ( w x + b − y ) 2 = Σ ( w x + b − y ) 2 / N Loss = (y^*-y)^2 = (wx + b -y)^2 = \Sigma(wx+b-y)^2/N Loss=(y∗−y)2=(wx+b−y)2=Σ(wx+b−y)2/N
而后使用梯度下降算法求解即可。
2、具体实现01:求平均误差Error= Σ ( w x i + b − y i ) m i n / \Sigma(wx_i+b-y_i)_{min} / Σ(wxi+b−yi)min/点数
根据给出的点集points,求平均误差Error
# points是一个二维数组,第一个数表示“点序号”,第二个数表示“x or y”
def compute_error_for_line_given_points( w , b , points):
totalError = 0 #总误差totalError
# 遍历点集points,len(points)表示点集中点的数量
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
totalError += ((w * x + b) - y) ** 2
# 返回平均误差Error = 总误差totalError/点个数
return totalError / float(len(points))
3、具体实现02:梯度下降处理
对于 L o s s = Σ ( w x + b − y ) 2 Loss = \Sigma (wx+b-y)^2 Loss=Σ(wx+b−y)2这样的“凹函数”而言,各个变量( w 、 b w、b w、b)偏导驻点的交汇处,即为整个函数的极小值点。如图所示:
由梯度下降的迭代公式:
w ∗ = w − l e a r n i n g R a t e ∗ ∂ L o s s ∂ w w^* = w - learningRate*\frac{\partial Loss}{\partial w} w∗=w−learningRate∗∂w∂Loss,其中 ∂ L o s s ∂ w = 2 ∗ ( w x + b − y ) ∗ x = Σ [ 2 ∗ ( w x + b − y ) ∗ x ] / N \frac{\partial Loss}{\partial w}=2*(wx+b-y)*x=\Sigma[2*(wx+b-y)*x]/N ∂w∂Loss=2∗(wx+b−y)∗x=Σ[2∗(wx+b−y)∗x]/N
b ∗ = b − l e a r n i n g R a t e ∗ ∂ L o s s ∂ b b^* = b - learningRate*\frac{\partial Loss}{\partial b} b∗=b−learningRate∗∂b∂Loss,其中 ∂ L o s s ∂ b = 2 ∗ ( w x + b − y ) = Σ [ 2 ∗ ( w x + b − y ) ] / N \frac{\partial Loss}{\partial b}=2*(wx+b-y)=\Sigma[2*(wx+b-y)]/N ∂b∂Loss=2∗(wx+b−y)=Σ[2∗(wx+b−y)]/N
# 梯度下降算法求解w和b
def step_gradient( b_current , w_current , points , learningRate ):
b_gradient = 0
w_gradient = 0
N = float(len(points))
for i in range( 0 , len(points)):
x = points[i,0]
y = points[i,1]
# 计算w和b偏导的平均值,求和、除以点数
w_gradient += (2 * (w_current * x + b_current - y) * x) / N
b_gradient += (2 * (w_current * x + b_current - y)) / N
# 依梯度迭代公式,求得本次迭代后新的w b的值
new_w = w_current - (learningRate * w_gradient)
new_b = b_current - (learningRate * b_gradient)
return [new_b, new_w]
4、具体实现03-循环迭代梯度信息
限定一个迭代次数(如100次),以此迭代结果作为w和b的较优解投入使用。
# 循环迭代梯度信息,求出最终的w和b
def gradient_descent_runner(points, starting_w, starting_b,
learning_rate, num_iterations):
w = starting_w
b = starting_b
for i in range(num_iterations): # num_iterations 即为迭代次数
w, b = step_gradient(b, w, np.array(points), learning_rate)
return [w, b]
5、具体实现04-调参运行
# 运行
def run():
# 使用numpy现成的genfromtxt导入txt文件里的点集数据
points = np.genfromtxt("data.csv", delimiter=",")
# 设置初始参数
learning_rate = 0.0001 # learningrate的选取尽可能小
initial_b = 0
initial_w = 0
num_iterations = 1000 # 迭代1000次
print("Starting gradient descent at b = {0}, m = {1}, error = {2}"
.format(initial_b, initial_w,
compute_error_for_line_given_points(initial_b, initial_w, points)
)
)
print("Running...")
[b, w] = gradient_descent_runner(points, initial_b, initial_w, learning_rate, num_iterations)
print("After {0} iterations b = {1}, m = {2}, error = {3}".
format(num_iterations, b, w,
compute_error_for_line_given_points(b, w, points))
)
if __name__ == '__main__':
run()