Python 多元线性回归

分析目的

分析空气中主要污染物浓度与空气指数之间的关系

分析数据

天气污染物浓度的数据集,该数据集源自天气后报网站上爬取的数据,为北京2013年10月28日到2016年1月31日的空气污染物浓度的数据。包括空气质量等级、AQI指数和当天排名。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
%matplotlib inline
import statsmodels.api as sm

线性回归

1.数据预处理

data = pd.read_csv("beijing.csv",index_col = 0)
data.head()
  AQI指数 当天AQI排名 PM25 PM10 So2 No2 Co O3
1 306 106 255 277 30 105 2.60 15
2 62 22 39 62 10 46 0.91 27
3 99 61 71 101 11 72 1.18 14
4 176 98 135 162 10 96 1.62 2
5 231 102 181 202 14 100 1.89 0
X = data.iloc[:,2:8]
X = sm.add_constant(X)
y = data.iloc[:,0]
print(X.head())
   const  PM25  PM10  So2  No2    Co  O3
1    1.0   255   277   30  105  2.60  15
2    1.0    39    62   10   46  0.91  27
3    1.0    71   101   11   72  1.18  14
4    1.0   135   162   10   96  1.62   2
5    1.0   181   202   14  100  1.89   0

2.建立模型

model1 = sm.OLS(y,X)  #建立模型
result = model1.fit() #训练模型
print(result.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  AQI指数   R-squared:                       0.963
Model:                            OLS   Adj. R-squared:                  0.963
Method:                 Least Squares   F-statistic:                     3549.
Date:                Thu, 02 Apr 2020   Prob (F-statistic):               0.00
Time:                        20:43:20   Log-Likelihood:                -3378.3
No. Observations:                 822   AIC:                             6771.
Df Residuals:                     815   BIC:                             6804.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         26.4656      2.099     12.610      0.000      22.346      30.585
PM25           0.9506      0.019     50.834      0.000       0.914       0.987
PM10           0.2412      0.015     15.691      0.000       0.211       0.271
So2           -0.0212      0.038     -0.555      0.579      -0.096       0.054
No2           -0.2624      0.047     -5.601      0.000      -0.354      -0.170
Co            -1.5038      1.109     -1.356      0.175      -3.680       0.672
O3             0.0468      0.018      2.621      0.009       0.012       0.082
==============================================================================
Omnibus:                      351.197   Durbin-Watson:                   1.782
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5876.885
Skew:                           1.489   Prob(JB):                         0.00
Kurtosis:                      15.756   Cond. No.                         733.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
result.f_pvalue #检验线性回归关系显著性
0.0
result.params #回归系数
const    26.465624
PM25      0.950583
PM10      0.241180
So2      -0.021246
No2      -0.262374
Co       -1.503839
O3        0.046783
dtype: float64

改进模型

由于So2与Co的p值大于0.05,所以排除这两个变量,重新建立模型

data = pd.read_csv("beijing.csv",index_col = 0)
data.head()
  AQI指数 当天AQI排名 PM25 PM10 So2 No2 Co O3
1 306 106 255 277 30 105 2.60 15
2 62 22 39 62 10 46 0.91 27
3 99 61 71 101 11 72 1.18 14
4 176 98 135 162 10 96 1.62 2
5 231 102 181 202 14 100 1.89 0
X = data.iloc[:,[2,3,5,7]]
X = sm.add_constant(X)
y = data.iloc[:,0]
print(X.head())
   const  PM25  PM10  No2  O3
1    1.0   255   277  105  15
2    1.0    39    62   46  27
3    1.0    71   101   72  14
4    1.0   135   162   96   2
5    1.0   181   202  100   0
model2 = sm.OLS(y,X)  #建立模型
result = model2.fit() #训练模型
print(result.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  AQI指数   R-squared:                       0.963
Model:                            OLS   Adj. R-squared:                  0.963
Method:                 Least Squares   F-statistic:                     5318.
Date:                Thu, 02 Apr 2020   Prob (F-statistic):               0.00
Time:                        21:35:18   Log-Likelihood:                -3379.7
No. Observations:                 822   AIC:                             6769.
Df Residuals:                     817   BIC:                             6793.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         25.9959      2.064     12.598      0.000      21.945      30.046
PM25           0.9378      0.016     58.347      0.000       0.906       0.969
PM10           0.2417      0.015     15.864      0.000       0.212       0.272
No2           -0.2891      0.044     -6.613      0.000      -0.375      -0.203
O3             0.0560      0.017      3.297      0.001       0.023       0.089
==============================================================================
Omnibus:                      337.402   Durbin-Watson:                   1.783
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5783.530
Skew:                           1.401   Prob(JB):                         0.00
Kurtosis:                      15.689   Cond. No.                         711.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

猜你喜欢

转载自www.cnblogs.com/jiaxinwei/p/12623207.html