机器学习11-回归-K近邻

模型介绍

K近邻模型只是借助周围K个最近训练样本的目标是数值,对待测样本的回归值进行决策。自然,也衍生出衡量待测样本回归值的不同方式,即到底是对K个近邻目标数值使用普通的算术平均算法,还是同时考虑距离的差异进行加权平均。

代码

使用两种不同配置的K近邻回归模型对美国波士顿房价数据进行回归预测和评估

from sklearn.svm import SVR
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.neighbors import KNeighborsRegressor

boston = load_boston()
# print(boston.DESCR)
X = boston.data
y = boston.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)

#分析回归目标值的差异
print("The max target value is ", np.max(boston.target))
print("The min target value is ", np.min(boston.target))
print("The average value is ", np.mean(boston.target))

#分别初始化对特征和目标值的标准化器
ss_X = StandardScaler()
ss_y = StandardScaler()

#标准化处理
X_train = ss_X.fit_transform(X_train)
X_test = ss_X.transform(X_test)

y_train = np.array(y_train).reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
y_train = ss_y.fit_transform(y_train)
y_test = ss_y.transform(y_test)

#使用两种不同配置的K近邻回归模型对美国波士顿房价数据进行回归预测

#初始化K近邻回归,并且调整配置,使得预测的方式为平均回归:weights = 'uniform'。
uni_knr = KNeighborsRegressor(weights='uniform')
uni_knr.fit(X_train, y_train)
uni_knr_y_predict = uni_knr.predict(X_test)

#初始化K近邻回归器,并且调整配置,使得预测的方式为根据距离加权回归:weights=’distance'
dis_knr = KNeighborsRegressor(weights='distance')
dis_knr.fit(X_train, y_train)
dis_knr_y_predict = dis_knr.predict(X_test)

#对两种不同配置的k近邻回归模型在美国波士顿房价数据上进行预测性能的评估
print('R-squared value of uniform-weighted KNeighorRegression:', uni_knr.score(X_test, y_test))
print('The mean squared error of uniform-weighted KNeighborRegression:', mean_squared_error(ss_y.inverse_transform(uni_knr_y_predict), ss_y.inverse_transform(y_test)))
print('The absolute error of uniform-weighted KNeighborRegression:', mean_absolute_error(ss_y.inverse_transform(uni_knr_y_predict),ss_y.inverse_transform(y_test)))

print('R-squared value of distance-weighted KNeighorRegression:', dis_knr.score(X_test, y_test))
print('The mean squared error of distance-weighted KNeighborRegression:', mean_squared_error(ss_y.inverse_transform(dis_knr_y_predict), ss_y.inverse_transform(y_test)))
print('The absolute error of distance-weighted KNeighborRegression:', mean_absolute_error(ss_y.inverse_transform(dis_knr_y_predict),ss_y.inverse_transform(y_test)))


#out[]:
# The max target value is  50.0
# The min target value is  5.0
# The average value is  22.532806324110677
# R-squared value of uniform-weighted KNeighorRegression: 0.6903454564606561
# The mean squared error of uniform-weighted KNeighborRegression: 24.01101417322835
# The absolute error of uniform-weighted KNeighborRegression: 2.9680314960629928
# R-squared value of distance-weighted KNeighorRegression: 0.7197589970156353
# The mean squared error of distance-weighted KNeighborRegression: 21.730250160926044
# The absolute error of distance-weighted KNeighborRegression: 2.8050568785108005

特点分析

  K近邻(回归)与K近邻(分类)一样,均属于无参数模型(Nonparametric model),同样没有参数训练过程,但模型的计算方法非常直观。
  验证了采用K近邻加权平均的回归策略可以获得较高的模型性能。

猜你喜欢

转载自blog.csdn.net/qq_38195197/article/details/81232760