使用 Python 中的 pandas 对 S&P500 的预测策略进行回测
本文将大量使用我们在上述文章中已经开发的软件,包括面向对象的回测引擎和预测信号生成器。面向对象编程的本质意味着我们随后编写的代码可以保持简短,因为“繁重的工作”是在我们已经开发的类上进行的。
成熟的 Python 库(例如matplotlib、pandas和scikit-learn)也减少了编写样板代码或自己实现知名算法的必要性。
预测策略
预测策略本身基于一种称为二次判别分析器的机器学习技术,该技术与线性判别分析器密切相关。这两种模型都在金融时间序列预测文章中进行了详细描述。
预测者使用前两天的收益作为一组因素来预测今天股市的走向。如果当天“上涨”的概率超过 50%,该策略将购买 500 股 SPY ETF 并在收盘时卖出。如果当天下跌的概率超过 50%,该策略将卖出 500 股 SPY ETF,然后在收盘时买回。因此,这是我们的第一个日内交易策略示例。
请注意,这不是一个特别现实的交易策略!由于开盘波动性过大、经纪公司的订单路由以及开盘/收盘时的潜在流动性问题等诸多因素,我们不太可能实现开盘价或收盘价。此外,我们还没有包括交易成本。由于每天都有往返交易,因此这些成本可能会占到回报的很大一部分。因此,我们的预测员需要相对准确地预测每日回报,否则交易成本将吞噬我们所有的交易回报。
执行
与其他 Python/pandas 相关教程一样,我使用了以下库:
- Python——2.7.3
- NumPy-1.8.0
- 熊猫-0.12.0
- matplotlib-1.1.0
- scikit-learn-0.14.1
snp_forecast.py
下面的实现需要backtest.py
从上一个教程中获取。此外forecast.py
(主要包含函数create_lagged_series
)是从上一个教程中创建的。第一步是导入必要的模块和对象:
# snp_forecast.py
import datetime
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
from pandas.io.data import DataReader
from sklearn.qda import QDA
from backtest import Strategy, Portfolio
from forecast import create_lagged_series
一旦包含了所有相关的库和模块,就该对Strategy
抽象基类进行子类化了,就像我们在之前的教程中所做的那样。SNPForecastingStrategy
旨在将二次判别分析器与 S&P500 股票指数相匹配,以预测其未来价值。模型的拟合在下面的方法中进行fit_model
,而实际信号是从该generate_signals
方法生成的。这与类的接口相匹配Strategy
。
二次判别分析器的工作原理以及下面的 Python 实现的细节在上一篇有关金融时间序列预测的文章中有详细描述。下面源代码中的注释详细讨论了该程序的作用:
# snp_forecast.py
class SNPForecastingStrategy(Strategy):
"""
Requires:
symbol - A stock symbol on which to form a strategy on.
bars - A DataFrame of bars for the above symbol."""
def __init__(self, symbol, bars):
self.symbol = symbol
self.bars = bars
self.create_periods()
self.fit_model()
def create_periods(self):
"""Create training/test periods."""
self.start_train = datetime.datetime(2001,1,10)
self.start_test = datetime.datetime(2005,1,1)
self.end_period = datetime.datetime(2005,12,31)
def fit_model(self):
"""Fits a Quadratic Discriminant Analyser to the
US stock market index (^GPSC in Yahoo)."""
# Create a lagged series of the S&P500 US stock market index
snpret = create_lagged_series(self.symbol, self.start_train,
self.end_period, lags=5)
# Use the prior two days of returns as
# predictor values, with direction as the response
X = snpret[["Lag1","Lag2"]]
y = snpret["Direction"]
# Create training and test sets
X_train = X[X.index < self.start_test]
y_train = y[y.index < self.start_test]
# Create the predicting factors for use
# in direction forecasting
self.predictors = X[X.index >= self.start_test]
# Create the Quadratic Discriminant Analysis model
# and the forecasting strategy
self.model = QDA()
self.model.fit(X_train, y_train)
def generate_signals(self):
"""Returns the DataFrame of symbols containing the signals
to go long, short or hold (1, -1 or 0)."""
signals = pd.DataFrame(index=self.bars.index)
signals['signal'] = 0.0
# Predict the subsequent period with the QDA model
signals['signal'] = self.model.predict(self.predictors)
# Remove the first five signal entries to eliminate
# NaN issues with the signals DataFrame
signals['signal'][0:5] = 0.0
signals['positions'] = signals['signal'].diff()
return signals
现在预测引擎已经生成了信号,我们可以创建一个。此投资组合对象与移动平均线交叉回测文章MarketIntradayPortfolio
中给出的示例不同,因为它以日内交易为基础进行。
如果信号表明将出现上涨,则投资组合将以开盘价“做多”(买入)500 股 SPY,然后在收盘价卖出。相反,如果信号表明将出现下跌,则投资组合将“做空”(卖出)500 股 SPY,然后在收盘价平仓。
为了实现这一点,每天确定市场开盘价和收盘价之间的价格差异,从而计算出买入或卖出的 500 股的每日利润。然后,通过累计每天的利润/亏损,自然而然地得出了一条权益曲线。它还有一个好处,就是让我们能够计算每天的利润/亏损统计数据。
以下是 的列表MarketIntradayPortfolio
:
# snp_forecast.py
class MarketIntradayPortfolio(Portfolio):
"""Buys or sells 500 shares of an asset at the opening price of
every bar, depending upon the direction of the forecast, closing
out the trade at the close of the bar.
Requires:
symbol - A stock symbol which forms the basis of the portfolio.
bars - A DataFrame of bars for a symbol set.
signals - A pandas DataFrame of signals (1, 0, -1) for each symbol.
initial_capital - The amount in cash at the start of the portfolio."""
def __init__(self, symbol, bars, signals, initial_capital=100000.0):
self.symbol = symbol
self.bars = bars
self.signals = signals
self.initial_capital = float(initial_capital)
self.positions = self.generate_positions()
def generate_positions(self):
"""Generate the positions DataFrame, based on the signals
provided by the 'signals' DataFrame."""
positions = pd.DataFrame(index=self.signals.index).fillna(0.0)
# Long or short 500 shares of SPY based on
# directional signal every day
positions[self.symbol] = 500*self.signals['signal']
return positions
def backtest_portfolio(self):
"""Backtest the portfolio and return a DataFrame containing
the equity curve and the percentage returns."""
# Set the portfolio object to have the same time period
# as the positions DataFrame
portfolio = pd.DataFrame(index=self.positions.index)
pos_diff = self.positions.diff()
# Work out the intraday profit of the difference
# in open and closing prices and then determine
# the daily profit by longing if an up day is predicted
# and shorting if a down day is predicted
portfolio['price_diff'] = self.bars['Close']-self.bars['Open']
portfolio['price_diff'][0:5] = 0.0
portfolio['profit'] = self.positions[self.symbol] * portfolio['price_diff']
# Generate the equity curve and percentage returns
portfolio['total'] = self.initial_capital + portfolio['profit'].cumsum()
portfolio['returns'] = portfolio['total'].pct_change()
return portfolio
最后一步是将 Strategy 和 Portfolio 对象与__main__
函数绑定在一起。该函数获取 SPY 工具的数据,然后在 S&P500 指数本身上创建信号生成策略。这是由^GSPC代码提供的。然后MarketIntradayPortfolio
以 100,000 美元的初始资本生成(如之前的教程中所述)。最后,计算收益并绘制权益曲线。
Strategy
请注意,此阶段所需的代码非常少,因为所有繁重的计算都在和子类中执行Portfolio
。这使得创建新的交易策略并快速测试它们以用于“策略管道”变得非常简单。
if __name__ == "__main__":
start_test = datetime.datetime(2005,1,1)
end_period = datetime.datetime(2005,12,31)
# Obtain the bars for SPY ETF which tracks the S&P500 index
bars = DataReader("SPY", "yahoo", start_test, end_period)
# Create the S&P500 forecasting strategy
snpf = SNPForecastingStrategy("^GSPC", bars)
signals = snpf.generate_signals()
# Create the portfolio based on the forecaster
portfolio = MarketIntradayPortfolio("SPY", bars, signals,
initial_capital=100000.0)
returns = portfolio.backtest_portfolio()
# Plot results
fig = plt.figure()
fig.patch.set_facecolor('white')
# Plot the price of the SPY ETF
ax1 = fig.add_subplot(211, ylabel='SPY ETF price in $')
bars['Close'].plot(ax=ax1, color='r', lw=2.)
# Plot the equity curve
ax2 = fig.add_subplot(212, ylabel='Portfolio value in $')
returns['total'].plot(ax=ax2, lw=2.)
fig.show()
程序的输出如下所示。在此期间,股市回报率为 4%(假设完全投资买入并持有策略),而算法本身的回报率也为 4%。请注意,交易成本(如佣金)尚未添加到此回测系统中。由于该策略每天进行一次往返交易,这些费用可能会大大减少回报。
2005-01-01 至 2006-12-31 期间 S&P500 预测策略表现
在后续文章中,我们将添加实际交易成本,利用额外的预测引擎,确定绩效指标并提供投资组合优化工具。
本文将大量使用我们在上述文章中已经开发的软件,包括面向对象的回测引擎和预测信号生成器。面向对象编程的本质意味着我们随后编写的代码可以保持简短,因为“繁重的工作”是在我们已经开发的类上进行的。
成熟的 Python 库(例如matplotlib、pandas和scikit-learn)也减少了编写样板代码或自己实现知名算法的必要性。
预测策略
预测策略本身基于一种称为二次判别分析器的机器学习技术,该技术与线性判别分析器密切相关。这两种模型都在金融时间序列预测文章中进行了详细描述。
预测者使用前两天的收益作为一组因素来预测今天股市的走向。如果当天“上涨”的概率超过 50%,该策略将购买 500 股 SPY ETF 并在收盘时卖出。如果当天下跌的概率超过 50%,该策略将卖出 500 股 SPY ETF,然后在收盘时买回。因此,这是我们的第一个日内交易策略示例。
请注意,这不是一个特别现实的交易策略!由于开盘波动性过大、经纪公司的订单路由以及开盘/收盘时的潜在流动性问题等诸多因素,我们不太可能实现开盘价或收盘价。此外,我们还没有包括交易成本。由于每天都有往返交易,因此这些成本可能会占到回报的很大一部分。因此,我们的预测员需要相对准确地预测每日回报,否则交易成本将吞噬我们所有的交易回报。
执行
与其他 Python/pandas 相关教程一样,我使用了以下库:
- Python——2.7.3
- NumPy-1.8.0
- 熊猫-0.12.0
- matplotlib-1.1.0
- scikit-learn-0.14.1
snp_forecast.py
下面的实现需要backtest.py
从上一个教程中获取。此外forecast.py
(主要包含函数create_lagged_series
)是从上一个教程中创建的。第一步是导入必要的模块和对象:
# snp_forecast.py
import datetime
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
from pandas.io.data import DataReader
from sklearn.qda import QDA
from backtest import Strategy, Portfolio
from forecast import create_lagged_series
一旦包含了所有相关的库和模块,就该对Strategy
抽象基类进行子类化了,就像我们在之前的教程中所做的那样。SNPForecastingStrategy
旨在将二次判别分析器与 S&P500 股票指数相匹配,以预测其未来价值。模型的拟合在下面的方法中进行fit_model
,而实际信号是从该generate_signals
方法生成的。这与类的接口相匹配Strategy
。
二次判别分析器的工作原理以及下面的 Python 实现的细节在上一篇有关金融时间序列预测的文章中有详细描述。下面源代码中的注释详细讨论了该程序的作用:
# snp_forecast.py
class SNPForecastingStrategy(Strategy):
"""
Requires:
symbol - A stock symbol on which to form a strategy on.
bars - A DataFrame of bars for the above symbol."""
def __init__(self, symbol, bars):
self.symbol = symbol
self.bars = bars
self.create_periods()
self.fit_model()
def create_periods(self):
"""Create training/test periods."""
self.start_train = datetime.datetime(2001,1,10)
self.start_test = datetime.datetime(2005,1,1)
self.end_period = datetime.datetime(2005,12,31)
def fit_model(self):
"""Fits a Quadratic Discriminant Analyser to the
US stock market index (^GPSC in Yahoo)."""
# Create a lagged series of the S&P500 US stock market index
snpret = create_lagged_series(self.symbol, self.start_train,
self.end_period, lags=5)
# Use the prior two days of returns as
# predictor values, with direction as the response
X = snpret[["Lag1","Lag2"]]
y = snpret["Direction"]
# Create training and test sets
X_train = X[X.index < self.start_test]
y_train = y[y.index < self.start_test]
# Create the predicting factors for use
# in direction forecasting
self.predictors = X[X.index >= self.start_test]
# Create the Quadratic Discriminant Analysis model
# and the forecasting strategy
self.model = QDA()
self.model.fit(X_train, y_train)
def generate_signals(self):
"""Returns the DataFrame of symbols containing the signals
to go long, short or hold (1, -1 or 0)."""
signals = pd.DataFrame(index=self.bars.index)
signals['signal'] = 0.0
# Predict the subsequent period with the QDA model
signals['signal'] = self.model.predict(self.predictors)
# Remove the first five signal entries to eliminate
# NaN issues with the signals DataFrame
signals['signal'][0:5] = 0.0
signals['positions'] = signals['signal'].diff()
return signals
现在预测引擎已经生成了信号,我们可以创建一个。此投资组合对象与移动平均线交叉回测文章MarketIntradayPortfolio
中给出的示例不同,因为它以日内交易为基础进行。
如果信号表明将出现上涨,则投资组合将以开盘价“做多”(买入)500 股 SPY,然后在收盘价卖出。相反,如果信号表明将出现下跌,则投资组合将“做空”(卖出)500 股 SPY,然后在收盘价平仓。
为了实现这一点,每天确定市场开盘价和收盘价之间的价格差异,从而计算出买入或卖出的 500 股的每日利润。然后,通过累计每天的利润/亏损,自然而然地得出了一条权益曲线。它还有一个好处,就是让我们能够计算每天的利润/亏损统计数据。
以下是 的列表MarketIntradayPortfolio
:
# snp_forecast.py
class MarketIntradayPortfolio(Portfolio):
"""Buys or sells 500 shares of an asset at the opening price of
every bar, depending upon the direction of the forecast, closing
out the trade at the close of the bar.
Requires:
symbol - A stock symbol which forms the basis of the portfolio.
bars - A DataFrame of bars for a symbol set.
signals - A pandas DataFrame of signals (1, 0, -1) for each symbol.
initial_capital - The amount in cash at the start of the portfolio."""
def __init__(self, symbol, bars, signals, initial_capital=100000.0):
self.symbol = symbol
self.bars = bars
self.signals = signals
self.initial_capital = float(initial_capital)
self.positions = self.generate_positions()
def generate_positions(self):
"""Generate the positions DataFrame, based on the signals
provided by the 'signals' DataFrame."""
positions = pd.DataFrame(index=self.signals.index).fillna(0.0)
# Long or short 500 shares of SPY based on
# directional signal every day
positions[self.symbol] = 500*self.signals['signal']
return positions
def backtest_portfolio(self):
"""Backtest the portfolio and return a DataFrame containing
the equity curve and the percentage returns."""
# Set the portfolio object to have the same time period
# as the positions DataFrame
portfolio = pd.DataFrame(index=self.positions.index)
pos_diff = self.positions.diff()
# Work out the intraday profit of the difference
# in open and closing prices and then determine
# the daily profit by longing if an up day is predicted
# and shorting if a down day is predicted
portfolio['price_diff'] = self.bars['Close']-self.bars['Open']
portfolio['price_diff'][0:5] = 0.0
portfolio['profit'] = self.positions[self.symbol] * portfolio['price_diff']
# Generate the equity curve and percentage returns
portfolio['total'] = self.initial_capital + portfolio['profit'].cumsum()
portfolio['returns'] = portfolio['total'].pct_change()
return portfolio
最后一步是将 Strategy 和 Portfolio 对象与__main__
函数绑定在一起。该函数获取 SPY 工具的数据,然后在 S&P500 指数本身上创建信号生成策略。这是由^GSPC代码提供的。然后MarketIntradayPortfolio
以 100,000 美元的初始资本生成(如之前的教程中所述)。最后,计算收益并绘制权益曲线。
Strategy
请注意,此阶段所需的代码非常少,因为所有繁重的计算都在和子类中执行Portfolio
。这使得创建新的交易策略并快速测试它们以用于“策略管道”变得非常简单。
if __name__ == "__main__":
start_test = datetime.datetime(2005,1,1)
end_period = datetime.datetime(2005,12,31)
# Obtain the bars for SPY ETF which tracks the S&P500 index
bars = DataReader("SPY", "yahoo", start_test, end_period)
# Create the S&P500 forecasting strategy
snpf = SNPForecastingStrategy("^GSPC", bars)
signals = snpf.generate_signals()
# Create the portfolio based on the forecaster
portfolio = MarketIntradayPortfolio("SPY", bars, signals,
initial_capital=100000.0)
returns = portfolio.backtest_portfolio()
# Plot results
fig = plt.figure()
fig.patch.set_facecolor('white')
# Plot the price of the SPY ETF
ax1 = fig.add_subplot(211, ylabel='SPY ETF price in $')
bars['Close'].plot(ax=ax1, color='r', lw=2.)
# Plot the equity curve
ax2 = fig.add_subplot(212, ylabel='Portfolio value in $')
returns['total'].plot(ax=ax2, lw=2.)
fig.show()
程序的输出如下所示。在此期间,股市回报率为 4%(假设完全投资买入并持有策略),而算法本身的回报率也为 4%。请注意,交易成本(如佣金)尚未添加到此回测系统中。由于该策略每天进行一次往返交易,这些费用可能会大大减少回报。
2005-01-01 至 2006-12-31 期间 S&P500 预测策略表现
在后续文章中,我们将添加实际交易成本,利用额外的预测引擎,确定绩效指标并提供投资组合优化工具。