pmdarima库的简介

Pmdarima是一个统计库，旨在填补Python时间序列分析能力的空白。Pmdarima在底层使用statsmodels，但其接口设计对于来自scikit-learn背景的用户来说是熟悉的。
Pmdarima（又称 pyramid-arima）是一个用于自动化 ARIMA 模型拟合的 Python 库。ARIMA（自回归综合移动平均模型）是一种常用的时间序列模型，用于分析和预测时间序列数据。Pmdarima 提供了一个自动化的方法来选择 ARIMA 模型的参数，包括 AR（自回归）和 MA（移动平均）阶数和差分阶数，从而减少了使用 ARIMA 模型时的一些繁琐工作。它基于类似于网格搜索的算法，从所有可能的 ARIMA 模型中选择最佳模型，使得该模型对于数据的拟合效果最好。
其包括以下功能：
>> 类似于R中auto.arima的功能
>> 一组检验平稳性和季节性的统计测试
>> 时间序列实用工具，如差分和反差分
>> 多种内生和外生转换器和特征工程器，包括Box-Cox和Fourier变换
>> 季节性时间序列分解
>> 交叉验证工具
>> 丰富的内置时间序列数据集，用于原型设计和示例
>> 类似于scikit-learn的流水线，以整合估计器并促进生产化。
除了自动化 ARIMA 参数选择之外，Pmdarima 还提供了许多有用的功能，如：
>> 支持 exogenous 变量的 ARIMA 模型拟合。
>> 支持用于数据预处理和特征工程的数据变换器。
>> 支持交叉验证和超参数优化。

总之，Pmdarima 是一个方便易用的 Python 库，可用于时间序列数据的分析和预测。它通过自动化参数选择和其他功能简化了 ARIMA 模型的使用。

GitHub链接：
GitHub - alkaline-ml/pmdarima: A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima库的安装

pip install pmdarima



pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pmdarima

pmdarima库的使用方法

1、基础用法

(1)、在wineind数据集上拟合简单的auto-ARIMA模型

import pmdarima as pm
from pmdarima.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

# Load/split your data
y = pm.datasets.load_wineind()
train, test = train_test_split(y, train_size=150)

# Fit your model
model = pm.auto_arima(train, seasonal=True, m=12)

# make your forecasts
forecasts = model.predict(test.shape[0])  # predict N steps into the future

# Visualize the forecasts (blue=train, green=forecasts)
x = np.arange(y.shape[0])
plt.plot(x[:150], train, c='blue')
plt.plot(x[150:], forecasts, c='green')
plt.show()

(2)、在sunspots数据集上拟合更复杂的流水线模型，将其序列化，然后从磁盘加载以进行预测

import pmdarima as pm
from pmdarima.model_selection import train_test_split
from pmdarima.pipeline import Pipeline
from pmdarima.preprocessing import BoxCoxEndogTransformer
import pickle

# Load/split your data
y = pm.datasets.load_sunspots()
train, test = train_test_split(y, train_size=2700)

# Define and fit your pipeline
pipeline = Pipeline([
    ('boxcox', BoxCoxEndogTransformer(lmbda2=1e-6)),  # lmbda2 avoids negative values
    ('arima', pm.AutoARIMA(seasonal=True, m=12,
                           suppress_warnings=True,
                           trace=True))
])

pipeline.fit(train)

# Serialize your model just like you would in scikit:
with open('model.pkl', 'wb') as pkl:
    pickle.dump(pipeline, pkl)
    
# Load it and make predictions seamlessly:
with open('model.pkl', 'rb') as pkl:
    mod = pickle.load(pkl)
    print(mod.predict(15))
# [25.20580375 25.05573898 24.4263037  23.56766793 22.67463049 21.82231043
# 21.04061069 20.33693017 19.70906027 19.1509862  18.6555793  18.21577243
# 17.8250318  17.47750614 17.16803394]

2、进阶用法

(1)、加载 lynx 数据集使用 auto_arima 函数拟合一个 ARIMA 模型对未来 10 个时间步长进行预测


import pmdarima as pm
from pmdarima import model_selection
import matplotlib.pyplot as plt
import numpy as np
# 加载数据并将其拆分为单独的部分
data = pm.datasets.load_lynx()
train, test = model_selection.train_test_split(data, train_size=100)
# fit一些验证(cv)样本
arima = pm.auto_arima(train, start_p=1, start_q=1, d=0, max_p=5, max_q=5,
                      out_of_sample_size=10, suppress_warnings=True,
                      stepwise=True, error_action='ignore')

# 现在绘制测试集的结果和预测
preds, conf_int = arima.predict(n_periods=test.shape[0],
                                return_conf_int=True)

fig, axes = plt.subplots(2, 1, figsize=(12, 8))
x_axis = np.arange(train.shape[0] + preds.shape[0])
axes[0].plot(x_axis[:train.shape[0]], train, alpha=0.75)
axes[0].scatter(x_axis[train.shape[0]:], preds, alpha=0.4, marker='o')
axes[0].scatter(x_axis[train.shape[0]:], test, alpha=0.4, marker='x')
axes[0].fill_between(x_axis[-preds.shape[0]:], conf_int[:, 0], conf_int[:, 1],
                     alpha=0.1, color='b')

# 填写在模型中"held out"样本的部分
axes[0].set_title("Train samples & forecasted test samples")

# 现在将实际样本添加到模型中并创建NEW预测
arima.update(test)
new_preds, new_conf_int = arima.predict(n_periods=10, return_conf_int=True)
new_x_axis = np.arange(data.shape[0] + 10)

axes[1].plot(new_x_axis[:data.shape[0]], data, alpha=0.75)
axes[1].scatter(new_x_axis[data.shape[0]:], new_preds, alpha=0.4, marker='o')
axes[1].fill_between(new_x_axis[-new_preds.shape[0]:],
                     new_conf_int[:, 0], new_conf_int[:, 1],
                     alpha=0.1, color='g')
axes[1].set_title("Added new observed values with new forecasts")
plt.show()

Py之pmdarima：pmdarima库的简介、安装、使用方法之详细攻略