【Machine Learning 】线性回归

线性回归

  • 我们可以通过测量损耗来衡量线路的适合程度。
  • 线性回归的目标是最小化损失。
  • 为了找到最佳拟合线,我们尝试找到最小化损失的b值(截距)和m值(斜率)。
  • 收敛是指参数在每次迭代时停止变化时的参数
  • 学习率是指每次迭代时参数的变化程度。
  • 我们可以使用Scikit-learn的LinearRegression()模型对一组点进行线性回归。

Scikit-Learn库

line_fitter = LinearRegression()  创建模型
line_fitter.fit(temperature, sales)  传入参数
sales_predict = line_fitter.predict(temperature) 预测模型

import codecademylib3_seaborn
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import numpy as np

temperature = np.array(range(60, 100, 2))
temperature = temperature.reshape(-1, 1)
sales = [65, 58, 46, 45, 44, 42, 40, 40, 36, 38, 38, 28, 30, 22, 27, 25, 25, 20, 15, 5]

line_fitter = LinearRegression()
line_fitter.fit(temperature, sales)
sales_predict = line_fitter.predict(temperature)

plt.plot(temperature, sales, 'o')
plt.plot(temperature,sales_predict)

plt.show()

 

原理 

预测直线 直线上会有loss

import codecademylib3_seaborn
import matplotlib.pyplot as plt
months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184]

#slope:
m = 12
#intercept:
b = 35

plt.plot(months, revenue, "o")

y = [m*month + b for month in months]

plt.plot(months,y)

plt.show()

 

LOSS

计算loss时 要使用平方距离 如下图 A的loss是 9(3^2) B的loss是1(1^2)

总loss=10 如果发现一条线路使loss小于10 那么这条线路会成为更好的线路

for i in range(len(y)):
  total_loss+=(y_predicted[i]-y[i])**2

 

 

减少loss

grandient descent 梯度下降

找到一点斜率向下的方向说明可以减少损失,所以应该渐变向下

公式

扫描二维码关注公众号,回复: 4495617 查看本文章
  • N is the number of points we have in our dataset
  • m is the current gradient guess  斜率
  • b is the current intercept guess  截距

找到当截距=b时的梯度的函数

def get_gradient_at_b(x,y,m,b):
    diff=0
    for i in range(len(x)):
      diff+=(y[i]-(m*x[i]+b))
    b_gradient=diff*(-2)/len(x)
    return b_gradient

 

 

公式

  • N is the number of points we have in our dataset
  • m is the current gradient guess  斜率
  • b is the current intercept guess  截距

 

找到当斜率=m时的梯度的函数 

def get_gradient_at_m(x, y, m, b):
    diff = 0
    N = len(x)
    for i in len(x):
      diff += x[i]*(y[i]-(m*x[i]+b))
    m_gradient = -2/N * diff
    return m_gradient

得到合适的梯度

def get_gradient_at_b(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
    x_val = x[i]
    y_val = y[i]
    diff += (y_val - ((m * x_val) + b))
  b_gradient = -(2/N) * diff  
  return b_gradient

def get_gradient_at_m(x, y, b, m):
  N = len(x)
  diff = 0
  for i in range(N):
      x_val = x[i]
      y_val = y[i]
      diff += x_val * (y_val - ((m * x_val) + b))
  m_gradient = -(2/N) * diff  
  return m_gradient

#Your step_gradient function here
def step_gradient(x, y, b_current, m_current):
    b_gradient = get_gradient_at_b(x, y, b_current, m_current)
    m_gradient = get_gradient_at_m(x, y, b_current, m_current)
    b = b_current - (0.01 * b_gradient)
    m = m_current - (0.01 * m_gradient)
    return [b, m]

months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
revenue = [52, 74, 79, 95, 115, 110, 129, 126, 147, 146, 156, 184]

# current intercept guess:
b = 0
# current slope guess:
m = 0

b, m = step_gradient(months, revenue, b, m)
print(b, m)

猜你喜欢

转载自blog.csdn.net/yt627306293/article/details/84950951
今日推荐