Remember the autoregressive prediction of time series algorithm--AR&Autoreg

background

Recently, the company made some data predictions for customers, but the customers didn't know which ones were more suitable for them, so after communicating with the business side, they targeted two aspects of data.

1. Engineering data: It is a good idea to evaluate engineering data and then make early warnings. However, because the data in this area are incomplete and there are some deviations in the accuracy of the data, we gave up.

2. Financial data: Financial data is a very good direction. First of all, the financial data is very accurate, and the regularity is relatively obvious.

So I finally chose the perspective of financial data analysis.

Note: All the data in this article are virtual data. The financial data mentioned above are just to explain how to analyze this business direction.

Introduction

Based on the above, I will probably talk about the direction of this prediction:

Since the blogger is mainly engaged in Java and Spark (Scala), he uses Python relatively little. If there are any mistakes in the process, please criticize and correct them. We will introduce to novice developers who are new to the industry how to use Python to implement AR (autoregressive) prediction models. The AR model is a model commonly used in time series forecasting that predicts future values ​​based on past observations .

We will follow the steps below

step describe
1 Import required libraries
2 Load time series data
3 Split the data set into training set and test set
4 Train AR model
5 Predict future values ​​using AR models
6 Evaluate model performance
7 Visualize prediction results

Code:

Import required libraries

First, we need to import some necessary libraries, including pandas for data processing and statsmodels for building AR models.

from statsmodels.tsa.ar_model import AR

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from pandas import Series,DataFrame

Load time series data

def data_process():
    #接受csv格式数据,根据情况修改为自己的地址
    df = pd.read_csv(r"C:\Users\123\Downloads\funsbymonth.csv")

    fans = df['fans'].values
    data=pd.Series(fans)

    df['date'] = pd.to_datetime(df['date'])
    data_index = df['date'].values
    data.index =pd.Index(data_index)

    #data.plot(figsize=(12,8))
    #plt.show()

    return data,fans

#数据处理
data,fans = data_process()

I have encapsulated a method for this, for reference only

I also provide the data for everyone to learn and use. You can pick it up if you need it.

date,fans
2021-6-30,12
2021-7-31,52
2021-8-31,58
2021-9-30,82
2021-10-31,65
2021-11-30,66
2021-12-31,16
2022-1-31,23
2022-2-28,54
2022-3-31,61
2022-4-30,78
2022-5-31,64
2022-6-30,56
2022-7-31,18
2022-8-31,16
2022-9-30,60
2022-10-31,75
2022-11-30,90
2022-12-31,63
2023-1-31,69
2023-2-28,15
2023-3-31,10
2023-4-30,60
2023-5-31,62
2023-6-30,78
2023-7-31,71

 Split the dataset

Before building the AR model, we need to split the data set into a training set and a test set. Generally, we use most of the data to train the model and a small part of the data to test the prediction effect of the model. Here we assume that the first 80% of the data is used for training and the last 20% of the data is used for testing.

train_data = data.iloc[:int(0.8*len(data))]
test_data = data.iloc[int(0.8*len(data)):]

Train AR model and predict

Next, we can use the data from the training set to train the AR model. Here, we use statsmodels library to build AR models.

def model_fit3(data,start,end,starTime):
    ar = AR(data).fit()
    arpredict_y3 =ar.predict(start=start, end=end ,dynamic = False)
    fig, ax = plt.subplots(figsize=(12, 8))
    ax = data.ix[starTime:].plot(ax=ax)
    arpredict_y3.plot(ax=ax)
    plt.show()
    return arpredict_y3

start = 10
end = len(fans)+3
starTime = '2022-1-31'
arpredict_y = model_fit3(data,start,end,starTime)

Visualize results

https://blog.51cto.com/u_16175449/6933670

https://blog.51cto.com/u_16175427/6815175

https://bbs.csdn.net/topics/392418314

https://blog.csdn.net/weixin_44034053/article/details/94359052

https://blog.51cto.com/u_13389043/6230021

https://blog.51cto.com/u_13389043/6230021

Here I am providing a model: autoregressive model AutoReg

Look at the code, the data set is still the above data set

import pandas as pd
from statsmodels.tsa.ar_model import AutoReg
import matplotlib.pyplot as plt

def data_process():
    #接受csv格式数据,根据情况修改为自己的地址
    df = pd.read_csv(r"C:\Users\allen_sun\Downloads\funsbymonth.csv")

    fans = df['fans'].values
    data=pd.Series(fans)

    df['date'] = pd.to_datetime(df['date'])
    data_index = df['date'].values
    data.index =pd.Index(data_index)

    #data.plot(figsize=(12,8))
    #plt.show()

    return data,fans

#数据处理
data,fans = data_process()

train_data = data.iloc[:int(0.8*len(data))]
test_data = data.iloc[int(0.8*len(data)):]

#模型训练
order = 9  # AR模型的阶数为2
model = AutoReg(train_data, lags=order)
model_fit = model.fit()

#模型预测
predictions = model_fit.predict(start=len(train_data), end=len(data)-1)

#模型评估
from sklearn.metrics import mean_squared_error, mean_absolute_error
#均方误差(MSE),结果越小越好
mse = mean_squared_error(test_data, predictions)
#平均绝对误差(MAE), 结果越小越好
mae = mean_absolute_error(test_data, predictions)
mse
mae

#print(predictions)


#预测起止点
start = 10
#预测长度,此长度表示向后预测4个阶段
end = len(fans)+3

order = 9  # AR模型的阶数为2
model = AutoReg(train_data, lags=order)
model_fit = model.fit()
arpredict_y3 =model_fit.predict(start=start, end=end ,dynamic = False)
fig, ax = plt.subplots(figsize=(12, 8))
#python自带的绘制曲线开始日期
starTime = '2022-1-31'
ax = data.ix[starTime:].plot(ax=ax)
arpredict_y3.plot(ax=ax)
plt.show()

Parameters in evaluation items:

1. Mean square error (MSE), the smaller the result, the better.

2. Root mean square error (RMSE), the smaller the result, the better.

3. Mean absolute error (MAE), the smaller the result, the better.

4. Mean absolute percentage error (MAPE), the smaller the result, the better.

Effect: (Also OK)

 https://blog.csdn.net/qq_40206371/article/details/121103377

Guess you like

Origin blog.csdn.net/Alex_81D/article/details/132710764