Section I: Code Bundle and Result Analyses
代码
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
plt.rcParams['figure.dpi']=200
plt.rcParams['savefig.dpi']=200
font = {'family': 'Times New Roman',
'weight': 'light'}
plt.rc("font", **font)
#Section 1: Load data and split it into Train/Test dataset
price=datasets.load_boston()
X=price.data
y=price.target
X_train,X_test,y_train,y_test=train_test_split(X,y,
test_size=0.3)
#Section 2: Feed train dataset into LinearRegression model
slr=LinearRegression()
slr.fit(X_train,y_train)
y_train_pred=slr.predict(X_train)
y_test_pred=slr.predict(X_test)
#Section 3: Visualize vertical distances between the actual annd predicted values
plt.scatter(y_train_pred,y_train_pred-y_train,
c='blue',marker='o',edgecolor='white',
label='Training Data')
plt.scatter(y_test_pred,y_test_pred-y_test,
c='limegreen',marker='s',edgecolors='white',
label='Test Data')
plt.xlabel("Predicted Values")
plt.ylabel("The Residuals")
plt.legend(loc='upper left')
plt.hlines(y=0,xmin=-10,xmax=50,color='black',lw=2)
plt.xlim([-10,50])
plt.savefig('./fig1.png')
plt.show()
#Section 4: Evaluate model performance via MSE and R2 scores
from sklearn.metrics import mean_squared_error,r2_score
print("MSE Train: %.3f, Test: %.3f" % \
(mean_squared_error(y_train,y_train_pred),
mean_squared_error(y_test,y_test_pred)))
print("R^2 Train: %.3f, Test: %.3f" % \
(r2_score(y_train,y_train_pred),
r2_score(y_test,y_test_pred)))
结果
Here, a baseline is drawn here, thta is, a residual plot with a line passing through the x-axis origin.
此外,值得注意误差在此基准线的上下波动,分布较小且无规律,则说明模型训练精度较好。
参考文献
Sebastian Raschka, Vahid Mirjalili. Python机器学习第二版. 南京:东南大学出版社,2018.