多因子选股模型

转载自:https://www.joinquant.com/post/15833?tag=algorithm

多因子选股模型

基于《【研究】量化选股-因子检验和多因子模型的构建》https://zhuanlan.zhihu.com/quantstory/20634542

在源码的基础上添加了一些因子,同时将时间滞后。

1.时间选取11-17年作为样本期,并进行因子筛选及检验。

2.基准选取上证综指(000001.XSHG)

模型构建及因子选取

拟选取以下四个方面的因子:

  1. 价值类因子:市盈率(PE),市净率(PB),市销率(PS),基本每股收益(EPS),账面市值比(B/M)
  2. 成长类因子:净资产收益率(ROE),总资产净利率(ROA),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),净利润环比增长率(inc_net_profit_annual),营业利润同比增长率(inc_operation_profit_year_on_year),营业利润环比增长率(inc_operation_profit_annual),主营毛利率(GP/R)、净利率(P/R)
  3. 规模类因子:净利润(net_profit),营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)、固定资产比例(FAP)
  4. 交投类因子:换手率(turnover_ratio)

采用排序法对因子的有效性进行验证。

In [1]:

import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import statsmodels.api as sm
import scipy.stats as scs
import matplotlib.pyplot as plt

月初取出所有因子数值,例如2018-01-01

In [2]:

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',
           'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',
           'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap',
                     'L/A', 'FAP',
           'turnover_ratio']

# 月初取出因子值
def get_factors(fdate, factors):
    stock_set = get_index_stocks('000001.XSHG', fdate)
    q = query(
        valuation.code,
        balance.total_owner_equities/valuation.market_cap/100000000,
        valuation.pe_ratio,
        valuation.pb_ratio,
        valuation.ps_ratio,
        income.basic_eps,
        indicator.roe,
        indicator.roa,
        indicator.gross_profit_margin,
        indicator.inc_net_profit_year_on_year,
        indicator.inc_net_profit_annual,
        indicator.inc_operation_profit_year_on_year,
        indicator.inc_operation_profit_annual,
        income.total_profit/income.operating_revenue,
        income.net_profit/income.operating_revenue,
        income.net_profit,
        income.operating_revenue,
        valuation.capitalization,
        valuation.circulating_cap,
        valuation.market_cap,
        valuation.circulating_market_cap,
        balance.total_liability/balance.total_assets,
        balance.fixed_assets/balance.total_assets,
        valuation.turnover_ratio
        ).filter(
        valuation.code.in_(stock_set),
        valuation.circulating_market_cap
    )
    fdf = get_fundamentals(q, date=fdate)
    fdf.index = fdf['code']
    fdf.columns = ['code'] + factors
    return fdf.iloc[:,-23:]

fdf = get_factors('2018-01-01', factors)
fdf.head().T

Out[2]:

code 600000.XSHG 600004.XSHG 600006.XSHG 600007.XSHG 600008.XSHG
PE 1.143846e+00 4.827871e-01 6.119287e-01 3.655787e-01 6.300832e-01
PB 6.804400e+00 2.009680e+01 1.093361e+02 2.808410e+01 3.863470e+01
PS 9.538000e-01 2.144100e+00 1.787800e+00 2.736200e+00 3.135900e+00
EPS 2.244700e+00 4.643700e+00 6.311000e-01 6.594300e+00 2.607200e+00
B/M 4.800000e-01 1.900000e-01 -9.700000e-03 1.800000e-01 3.390000e-02
ROE 3.413200e+00 2.754000e+00 -2.960000e-01 2.928800e+00 1.513400e+00
ROA 2.316000e-01 1.990600e+00 -7.116000e-01 1.580100e+00 4.027000e-01
gross_profit_margin NaN 3.798770e+01 1.084940e+01 5.062150e+01 3.024400e+01
inc_net_profit_year_on_year -1.588900e+00 8.844300e+00 -6.544932e+02 6.390600e+00 2.954560e+01
inc_net_profit_annual -7.200000e-03 4.770500e+00 -2.012624e+03 3.357990e+01 -3.995000e-01
inc_operation_profit_year_on_year -1.833300e+00 1.868020e+01 -6.605708e+02 -4.675000e-01 1.377109e+02
inc_operation_profit_annual 2.019000e+00 2.919000e+00 -1.447837e+03 2.506970e+01 -2.233990e+01
GP/R 4.344424e-01 3.085870e-01 -4.038174e-02 3.296453e-01 1.174919e-01
P/R 3.350075e-01 2.308729e-01 -3.273438e-02 2.473615e-01 8.858635e-02
net_profit 1.387400e+10 3.935985e+08 -1.587791e+08 1.823589e+08 1.900669e+08
operating_revenue 4.141400e+10 1.704828e+09 4.850528e+09 7.372163e+08 2.145555e+09
capitalization 2.935208e+06 2.069320e+05 2.000000e+05 1.007282e+05 4.820614e+05
circulating_cap 2.810376e+06 2.069320e+05 2.000000e+05 1.007282e+05 4.820614e+05
market_cap 3.695427e+03 3.041901e+02 1.170000e+02 1.726482e+02 2.477796e+02
circulating_market_cap 3.538264e+03 3.041901e+02 1.170000e+02 1.726482e+02 2.477796e+02
L/A 9.302917e-01 2.719476e-01 6.862103e-01 4.571281e-01 6.699646e-01
FAP 4.168150e-03 3.381332e-01 1.754063e-01 1.815366e-01 1.011792e-01
turnover_ratio 5.820000e-02 4.095000e-01 5.574000e-01 7.120000e-02 3.734000e-01

对每个因子大小排序(以流通市值为例)

In [3]:

score = fdf['circulating_market_cap'].order()
score.head()

Out[3]:

code
603580.XSHG    5.0777
603991.XSHG    5.2659
603330.XSHG    5.3535
603041.XSHG    5.6300
603269.XSHG    5.7038
Name: circulating_market_cap, dtype: float64

股票个数

In [4]:

len(score)

Out[4]:

1352

按照流通市值将股票池进行五等分

In [5]:

startdate = '2018-01-01'
enddate = '2018-02-01'
nextdate = '2018-03-01'
df = {}
circulating_market_cap = fdf['circulating_market_cap']
port1 = list(score.index)[: len(score)/5]
port2 = list(score.index)[ len(score)/5: 2*len(score)/5]
port3 = list(score.index)[ 2*len(score)/5: -2*len(score)/5]
port4 = list(score.index)[ -2*len(score)/5: -len(score)/5]
port5 = list(score.index)[ -len(score)/5: ]

按流通市值加权计算组合月收益(例如2018-01,2018-02月收益)

In [6]:

def calculate_port_monthly_return(port, startdate, enddate, nextdate, circulating_market_cap):
    
    close1 = get_price(port, startdate, enddate, 'daily', ['close'])
    close2 = get_price(port, enddate, nextdate, 'daily', ['close'])
    weighted_m_return = ((close2['close'].ix[0,:]/close1['close'].ix[0,:]-1)*
                         circulating_market_cap).sum()/(circulating_market_cap.ix[port].sum())
    return weighted_m_return
calculate_port_monthly_return(port1, '2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])

Out[6]:

-0.09004705495088357

计算基准月收益

In [7]:

def calculate_benchmark_monthly_return(startdate, enddate, nextdate):
    
    close1 = get_price(['000001.XSHG'],startdate,enddate,'daily',['close'])['close']
    close2 = get_price(['000001.XSHG'],enddate, nextdate, 'daily',['close'])['close']
    benchmark_return = (close2.ix[0,:]/close1.ix[0,:]-1).sum()
    return benchmark_return
calculate_benchmark_monthly_return('2018-01-01','2018-02-01','2018-03-01')

Out[7]:

0.029462448444448563

观察5个组合在2018年初一个月内的收益情况

从结果可以看出,在构建因子组合之前,前四组的收益跑输大盘。

In [8]:

benchmark_return = calculate_benchmark_monthly_return('2018-01-01', '2018-02-01', '2018-03-01')
df['port1'] = calculate_port_monthly_return(port1,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port2'] = calculate_port_monthly_return(port2,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port3'] = calculate_port_monthly_return(port3,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port4'] = calculate_port_monthly_return(port4,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
df['port5'] = calculate_port_monthly_return(port5,'2018-01-01', '2018-02-01', '2018-03-01', fdf['circulating_market_cap'])
print Series(df)
print 'benchmark_return %s'%benchmark_return
port1   -0.090047
port2   -0.088405
port3   -0.075064
port4   -0.060624
port5    0.068629
dtype: float64
benchmark_return 0.0294624484444

构建因子组合,计算不同组合月收益率

时间:2011-2017年,计算1-5组以及benchmark组合的月收益率,形成84×6的面板数据。

In [9]:

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',
           'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',
           'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap',
                     'L/A', 'FAP',
           'turnover_ratio']
#因为研究模块取fundamental数据默认date为研究日期的前一天。所以要自备时间序列。按月取
year = ['2011','2012','2013','2014','2015','2016','2017']
month = ['01','02','03','04','05','06','07','08','09','10','11','12']
result = {}

for i in range(7*12):
    startdate = year[i/12] + '-' + month[i%12] + '-01'
    try:
        enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'
    except IndexError:
        enddate = '2018-01-01'
    try:
        nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'
    except IndexError:
        if enddate == '2018-01-01':
            nextdate = '2018-02-01'
        else:
            nextdate = '2018-01-01'
    # print 'time %s'%startdate
    fdf = get_factors(startdate,factors)
    CMV = fdf['circulating_market_cap']
    #5个组合,23个因子
    df = DataFrame(np.zeros(6*23).reshape(6,23),index = ['port1','port2','port3','port4','port5','benchmark'],columns = factors)
    for fac in factors:
        score = fdf[fac].order()
        port1 = list(score.index)[: len(score)/5]
        port2 = list(score.index)[ len(score)/5+1: 2*len(score)/5]
        port3 = list(score.index)[ 2*len(score)/5+1: -2*len(score)/5]
        port4 = list(score.index)[ -2*len(score)/5+1: -len(score)/5]
        port5 = list(score.index)[ -len(score)/5+1: ]
        df.ix['port1',fac] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port2',fac] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port3',fac] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port4',fac] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['port5',fac] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)
        df.ix['benchmark',fac] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)
    # print 'factor %s'%fac
    result[i+1]=df
monthly_return = pd.Panel(result)

取某个因子的5个组合月收益情况(例如市盈率PE)

In [11]:

monthly_return[:,:,'PE']

Out[11]:

  1 2 3 4 5 6 7 8 9 10 ... 75 76 77 78 79 80 81 82 83 84
port1 -0.063961 0.057468 -0.003538 0.011939 -0.000767 0.028005 0.048595 -0.003958 -0.109566 0.062509 ... 0.021345 -0.006460 -0.001198 0.049791 0.009338 0.049369 0.058072 0.069637 -0.033160 0.056772
port2 -0.065009 0.076102 -0.027128 -0.018031 -0.066994 0.031146 0.028017 -0.046184 -0.120076 0.034576 ... 0.037914 -0.048666 -0.053362 0.072887 0.040365 0.036833 0.069451 0.003253 -0.022327 0.018349
port3 -0.056932 0.079801 -0.017569 -0.027592 -0.073196 0.034040 -0.025730 -0.054367 -0.129013 0.045660 ... 0.017931 -0.045419 -0.053020 0.054920 0.028066 0.018909 0.027487 0.002008 -0.047184 0.006735
port4 -0.021293 0.046165 -0.005278 -0.011301 -0.069544 0.019637 -0.019397 -0.080514 -0.107611 0.081045 ... 0.004030 -0.021088 -0.005480 0.057372 0.065631 0.025304 -0.005043 0.059699 0.016978 0.023916
port5 0.013760 0.024953 0.050458 0.006419 -0.054836 0.007156 -0.035373 -0.041296 -0.052494 0.068615 ... 0.011837 -0.021979 0.048663 0.021466 0.074965 0.014275 -0.003084 -0.003351 0.003836 0.009916
benchmark -0.018820 0.042859 0.016612 -0.011870 -0.064326 0.005755 -0.020142 -0.054642 -0.082649 0.053409 ... 0.007198 -0.038710 -0.013070 0.030068 0.030266 0.022620 0.002156 0.006382 -0.023056 0.009257

6 rows × 84 columns

总收益情况

In [12]:

(monthly_return[:,:,'PE'].T+1).cumprod().tail()

Out[12]:

  port1 port2 port3 port4 port5 benchmark
80 2.173926 1.652334 1.708928 1.980452 2.433185 1.180349
81 2.300171 1.767090 1.755901 1.970465 2.425681 1.182893
82 2.460347 1.772839 1.759427 2.088099 2.417553 1.190442
83 2.378763 1.733257 1.676409 2.123552 2.426825 1.162996
84 2.513809 1.765060 1.687700 2.174338 2.450891 1.173762

因子检验量化指标

模型建立后,计算n个组合的年化复合收益、超额收益、不同市场情况下高收益组合跑赢benchmark和低收益组合跑输benchmark的概率。

检验有效性的量化标准:

(1)序列1-n的组合,年化复合收益应满足一定排序关系,即组合因子大小与收益具有较大相关关系。假定序列i的组合年化收益为Xi,则Xi与i的相关性绝对值Abs(Corr(Xi,i))>MinCorr。此处MinCorr为给定的最小相关阈值。

(2)序列1和n表示的两个极端组合超额收益分别为AR1、ARn。MinARtop、MinARbottom表示最小超额收益阈值。 if AR1 > ARn #因子越小,收益越大 则应满足AR1 > MinARtop >0 and ARn < MinARbottom < 0 if AR1 < ARn #因子越小,收益越大 则应满足ARn > MinARtop >0 and AR1 < MinARbottom < 0 以上条件保证因子最大和最小的两个组合,一个明显跑赢市场,一个明显跑输市场。

(3)在任何市场行情下,1和n两个极端组合,都以较高概率跑赢或跑输市场。 以上三个条件,可以选出过去一段时间有较好选股能力的因子。

因为开始选择的因子较多,因此三条量化标准的选择更加严格,采用如下标准进行选取:

(1)记录因子相关性,>0.7或<-0.7合格。

(2)记录赢家组合和输家组合超额收益。

(3)记录赢家组合跑赢概率>0.6和输家组合跑输概率>0.4合格。

In [13]:

total_return = {}
annual_return = {}
excess_return = {}
win_prob = {}
loss_prob = {}
effect_test = {}
MinCorr = 0.3
Minbottom = -0.05
Mintop = 0.05
for fac in factors:
    effect_test[fac] = {}
    monthly = monthly_return[:,:,fac]
    total_return[fac] = (monthly+1).T.cumprod().iloc[-1,:]-1
    annual_return[fac] = (total_return[fac]+1)**(1./6)-1
    excess_return[fac] = annual_return[fac]- annual_return[fac][-1]
    #判断因子有效性
    #1.年化收益与组合序列的相关性 大于 阈值
    effect_test[fac][1] = annual_return[fac][0:5].corr(Series([1,2,3,4,5],index = annual_return[fac][0:5].index))
    #2.高收益组合跑赢概率
    #因子小,收益小,port1是输家组合,port5是赢家组合
    if total_return[fac][0] < total_return[fac][-2]:
        loss_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]
        loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))
        win_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]
        win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))
        
        effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]
        
        #超额收益
        effect_test[fac][2] = [excess_return[fac][-2]*100,excess_return[fac][0]*100]
            
    #因子小,收益大,port1是赢家组合,port5是输家组合
    else:
        loss_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]
        loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))
        win_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]
        win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))
        
        effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]
        
        #超额收益
        effect_test[fac][2] = [excess_return[fac][0]*100,excess_return[fac][-2]*100]

#由于选择的因子较多,test标准选取适当严格一些
#effect_test[1]记录因子相关性,>0.7或<-0.7合格
#effect_test[2]记录【赢家组合超额收益,输家组合超额收益】
#effect_test[3]记录赢家组合跑赢概率和输家组合跑输概率。【>0.6,>0.4】合格 (因实际情况,跑输概率暂时不考虑)
DataFrame(effect_test).T

Out[13]:

  1 2 3
B/M 0.6281959 [15.1984852636, 8.76175660448] [0.690476190476, 0.404761904762]
EPS 0.2488584 [14.2720133294, 12.9632231367] [0.678571428571, 0.357142857143]
FAP -0.5671644 [13.4503120268, 9.44267504971] [0.619047619048, 0.380952380952]
GP/R 0.8064658 [13.7519085368, 9.10242336036] [0.619047619048, 0.357142857143]
L/A -0.5898578 [16.5046555213, 12.1611504111] [0.702380952381, 0.416666666667]
P/R 0.9215462 [13.980265264, 9.09336493425] [0.642857142857, 0.380952380952]
PB -0.8818369 [13.9012096024, 6.71073706755] [0.619047619048, 0.428571428571]
PE 0.1328435 [13.9001078939, 13.4085302139] [0.607142857143, 0.369047619048]
PS -0.5030761 [14.1865783133, 9.18250270639] [0.607142857143, 0.392857142857]
ROA 0.5423133 [19.3405425743, 9.77751849214] [0.75, 0.380952380952]
ROE 0.6386198 [17.9776162079, 9.73910681099] [0.654761904762, 0.404761904762]
capitalization -0.7644211 [22.4171821446, 9.86517390072] [0.583333333333, 0.404761904762]
circulating_cap -0.7761155 [19.8132954476, 9.86514645415] [0.571428571429, 0.369047619048]
circulating_market_cap -0.8791725 [38.1580067747, 10.3384004828] [0.714285714286, 0.369047619048]
gross_profit_margin 0.7770139 [15.5893122733, 9.22929383936] [0.642857142857, 0.452380952381]
inc_net_profit_annual 0.6899743 [14.9827068239, 9.99043264863] [0.678571428571, 0.392857142857]
inc_net_profit_year_on_year 0.8082138 [13.825611634, 3.32909642528] [0.630952380952, 0.416666666667]
inc_operation_profit_annual 0.5963116 [13.1949471333, 9.79858245467] [0.654761904762, 0.404761904762]
inc_operation_profit_year_on_year 0.8663793 [14.0478401847, 3.17046201915] [0.654761904762, 0.404761904762]
market_cap -0.8262643 [44.3574164544, 10.5284689923] [0.738095238095, 0.369047619048]
net_profit 0.04857344 [12.1195026493, 8.12374126557] [0.642857142857, 0.380952380952]
operating_revenue -0.7751005 [23.9766654178, 11.219895262] [0.630952380952, 0.345238095238]
turnover_ratio -0.6218568 [10.175151521, 4.22831336907] [0.619047619048, 0.511904761905]

有效因子

同时满足上述三个条件的有:

(1)价值类因子:市盈率(B/M)

(2)成长类因子:主营毛利率(P/R),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),营业利润同比增长率( inc_operation_profit_year_on_year)

(3)规模类因子:营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)

有效因子总收益

In [14]:

effective_factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']
DataFrame(total_return).ix[:,effective_factors].T

Out[14]:

  port1 port2 port3 port4 port5 benchmark
B/M 0.918228 1.480658 1.142045 1.148155 1.686498 0.173762
L/A 1.870086 1.526532 0.843702 1.124105 1.297099 0.173762
P/R 0.952724 1.060859 1.183619 1.649951 1.524196 0.173762
capitalization 2.837346 1.656063 1.372731 1.964715 1.035016 0.173762
circulating_cap 2.382449 1.692737 1.170379 1.747633 1.035013 0.173762
circulating_market_cap 6.812751 2.619596 1.248171 1.063917 1.086887 0.173762
gross_profit_margin 0.967012 1.086652 0.899555 1.183325 1.740373 0.173762
inc_net_profit_year_on_year 0.421356 0.994127 1.055084 2.446268 1.504189 0.173762
inc_operation_profit_year_on_year 0.408645 0.801897 1.381442 2.183790 1.532979 0.173762
market_cap 9.116529 1.863749 1.864567 0.896007 1.108029 0.173762
operating_revenue 3.133399 1.325240 1.267816 1.006326 1.186449 0.173762

有效因子年化收益

In [15]:

DataFrame(annual_return).ix[:,effective_factors].T

Out[15]:

  port1 port2 port3 port4 port5 benchmark
B/M 0.114680 0.163486 0.135372 0.135911 0.179047 0.027062
L/A 0.192109 0.167045 0.107342 0.133781 0.148674 0.027062
P/R 0.117996 0.128084 0.139015 0.176358 0.166865 0.027062
capitalization 0.251234 0.176810 0.154892 0.198571 0.125714 0.027062
circulating_cap 0.225195 0.179503 0.137861 0.183477 0.125714 0.027062
circulating_market_cap 0.408642 0.239110 0.144559 0.128363 0.130446 0.027062
gross_profit_margin 0.119355 0.130425 0.112864 0.138990 0.182955 0.027062
inc_net_profit_year_on_year 0.060353 0.121912 0.127556 0.229018 0.165318 0.027062
inc_operation_profit_year_on_year 0.058767 0.103117 0.155598 0.212897 0.167540 0.027062
market_cap 0.470636 0.191669 0.191726 0.112517 0.132347 0.027062
operating_revenue 0.266829 0.151007 0.146220 0.123053 0.139261 0.027062

各个因子6组收益的时间序列图:

In [16]:

def draw_return_picture(df):
    plt.figure(figsize =(10,4))
    plt.plot((df.T+1).cumprod().ix[:,0], label = 'port1')
    plt.plot((df.T+1).cumprod().ix[:,1], label = 'port2')
    plt.plot((df.T+1).cumprod().ix[:,2], label = 'port3')
    plt.plot((df.T+1).cumprod().ix[:,3], label = 'port4')
    plt.plot((df.T+1).cumprod().ix[:,4], label = 'port5')
    plt.plot((df.T+1).cumprod().ix[:,5], label = 'benchmark')
    plt.xlabel('return of factor %s'%fac)
    plt.legend(loc=0)
for fac in effective_factors:
    draw_return_picture(monthly_return[:,:,fac])

冗余因子的剔除

有些因子,因为内在的逻辑比较相近等原因,选出来的组合在个股构成和收益等方面相关性较高。所以要对这些因子做冗余剔除,保留同类因子中收益最好、区分度最高的因子。 由于本人能力有限,未完成此步骤,具体方法:

(1)对不同因子的n个组合打分。收益越大分值越大。分值达到好将分值赋给每月该组合内的所有个股。

if AR1 > ARn #因子越小,收益越大

则组合i的分值为(n-i+1)

if AR1 < ARn #因子越小,收益越小

则组合i的分值为i

(2)按月计算个股不同因子得分的相关性矩阵。得到第t月个股的因子得分相关性矩阵Score_Corrt,u,v。u,v为因子序号。

(3)计算样本期内相关性矩阵的平均值。即样本期共m个月,加总矩阵后取1/m。

(4)设定得分相关性阈值MinScoreCorr。只保留与其他因子相关性较小的因子。

模型建立和选股

根据选好的有效因子,每月初对市场个股计算因子得分,按一定权重求得所有因子的平均分。如遇因子当月无取值时,按剩下的因子分值求加权平均。通过对个股的加权平均得分进行排序,选择排名靠前的股票交易。

以下代码段等权重对因子分值求和,选出分值最高的股票进行交易

In [17]:

def score_stock(fdate):
    #B/M, L/A, P/R, capitalization, circulating_cap, circulating_market_cap, market_cap, operating_revenue
    #八个因子越小收益越大,分值越大,应降序排;gross_profit_margin, inc_net_profit_year_on_year, 
    #inc_operation_profit_year_on_year三个因子越大收益越大应顺序排
    effective_factors = {'inc_net_profit_year_on_year':True,'gross_profit_margin':True,'inc_operation_profit_year_on_year':True,
                         'B/M':False,'L/A':False,'P/R':False, 'capitalization':False, 'circulating_cap':False,
                        'circulating_market_cap':False, 'market_cap':False, 'operating_revenue':False}
    fdf = get_factors(fdate)
    score = {}
    for fac,value in effective_factors.items():
        score[fac] = fdf[fac].rank(ascending = value,method = 'first')
    print DataFrame(score).T.sum().order(ascending = False).head(5)
    score_stock = list(DataFrame(score).T.sum().order(ascending = False).index)
    return score_stock,fdf['circulating_market_cap']
def get_factors(fdate):
    factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']
    stock_set = get_index_stocks('000001.XSHG',fdate)
    q = query(
        valuation.code,
        balance.total_owner_equities/valuation.market_cap/100000000,
        balance.total_liability/balance.total_assets,
        income.net_profit/income.operating_revenue,
        valuation.capitalization,
        valuation.circulating_cap,
        valuation.circulating_market_cap,
        indicator.gross_profit_margin,
        indicator.inc_net_profit_year_on_year,
        indicator.inc_operation_profit_year_on_year,
        valuation.market_cap,
        income.operating_revenue
        ).filter(
        valuation.code.in_(stock_set)
    )
    fdf = get_fundamentals(q,date = fdate)
    fdf.index = fdf['code']
    fdf.columns = ['code'] + factors
    return fdf.iloc[:,-11:]
[score_result,circulating_market_cap] = score_stock('2017-01-01')
code
603859.XSHG    10554
603189.XSHG    10521
600817.XSHG    10451
600385.XSHG    10372
603518.XSHG    10326
dtype: float64

6个组合和benchmark在7年中的月收益率

计算port1-port5以及TOP20和benchmark的月收益率,时间跨度为7×12=84个月,并将所有数据储存在panel中。

In [18]:

year = ['2011','2012','2013','2014','2015','2016','2017']

month = ['01','02','03','04','05','06','07','08','09','10','11','12']
factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
          'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']
result = {}

for i in range(7*12):

    startdate = year[i/12] + '-' + month[i%12] + '-01'
    try:
        enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'
    except IndexError:
        enddate = '2018-01-01'
    try:
        nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'
    except IndexError:
        if enddate == '2018-01-01':
            nextdate = '2018-02-01'
        else:
            nextdate = '2018-01-01'
    print 'time %s'%startdate
    #综合11个因子打分后,划分几个组合
    df = DataFrame(np.zeros(7),index = ['Top20','port1','port2','port3','port4','port5','benchmark'])
    [score,circulating_market_cap] = score_stock(startdate)
    port0 = score[:20]
    port1 = score[: len(score)/5]
    port2 = score[ len(score)/5+1: 2*len(score)/5]
    port3 = score[ 2*len(score)/5+1: -2*len(score)/5]
    port4 = score[ -2*len(score)/5+1: -len(score)/5]
    port5 = score[ -len(score)/5+1: ]
    print len(score)
 
    df.ix['Top20'] = calculate_port_monthly_return(port0,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port1'] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port2'] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port3'] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port4'] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['port5'] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)
    df.ix['benchmark'] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)
    result[i+1]=df
    
time 2011-01-01
code
600671.XSHG    8250
600506.XSHG    8065
600365.XSHG    8040
600634.XSHG    7864
600647.XSHG    7843
dtype: float64
867
time 2011-02-01
code
600671.XSHG    8275
600365.XSHG    8059
600506.XSHG    8055
600634.XSHG    7874
600647.XSHG    7855
dtype: float64
867
time 2011-03-01
code
600671.XSHG    8266
600506.XSHG    8034
600365.XSHG    7951
600634.XSHG    7852
600647.XSHG    7842
dtype: float64
866
time 2011-04-01
code
600671.XSHG    8285
600365.XSHG    7943
600634.XSHG    7902
600617.XSHG    7852
600077.XSHG    7834
dtype: float64
874
time 2011-05-01
code
600671.XSHG    8522
600340.XSHG    8239
600365.XSHG    8209
600562.XSHG    8103
600613.XSHG    8097
dtype: float64
885
time 2011-06-01
code
600671.XSHG    8506
600365.XSHG    8221
600149.XSHG    8120
600562.XSHG    8104
600613.XSHG    8104
dtype: float64
885
time 2011-07-01
code
600671.XSHG    8518
600365.XSHG    8240
600149.XSHG    8140
600613.XSHG    8111
600562.XSHG    8098
dtype: float64
885
time 2011-08-01
code
600671.XSHG    8534
600149.XSHG    8126
600613.XSHG    8116
600562.XSHG    8076
600520.XSHG    7937
dtype: float64
886
time 2011-09-01
code
600634.XSHG    8410
600562.XSHG    8198
600671.XSHG    8059
600476.XSHG    7986
600077.XSHG    7970
dtype: float64
901
time 2011-10-01
code
600634.XSHG    8416
600562.XSHG    8113
600671.XSHG    8071
600476.XSHG    8037
600077.XSHG    7963
dtype: float64
902
time 2011-11-01
code
600671.XSHG    8693
600705.XSHG    8048
600421.XSHG    8030
600476.XSHG    8030
600576.XSHG    8006
dtype: float64
913
time 2011-12-01
code
600671.XSHG    8707
600576.XSHG    8080
600705.XSHG    8064
600476.XSHG    8043
600571.XSHG    7970
dtype: float64
913
time 2012-01-01
code
600671.XSHG    8688
600576.XSHG    8088
600705.XSHG    8074
600476.XSHG    8044
600421.XSHG    7984
dtype: float64
913
time 2012-02-01
code
600671.XSHG    8695
600136.XSHG    8190
600576.XSHG    8103
600705.XSHG    8086
600476.XSHG    8068
dtype: float64
913
time 2012-03-01
code
600671.XSHG    8702
600136.XSHG    8178
600576.XSHG    8088
600476.XSHG    8047
600571.XSHG    7994
dtype: float64
912
time 2012-04-01
code
600671.XSHG    8748
600365.XSHG    8250
600576.XSHG    8223
600136.XSHG    8201
600733.XSHG    8149
dtype: float64
914
time 2012-05-01
code
600671.XSHG    8792
600593.XSHG    8544
600562.XSHG    8469
600513.XSHG    8430
600576.XSHG    8395
dtype: float64
920
time 2012-06-01
code
600634.XSHG    8708
600593.XSHG    8620
600513.XSHG    8496
600562.XSHG    8481
600455.XSHG    8228
dtype: float64
922
time 2012-07-01
code
600634.XSHG    8705
600593.XSHG    8637
600562.XSHG    8493
600513.XSHG    8400
600571.XSHG    8239
dtype: float64
922
time 2012-08-01
code
600634.XSHG    8707
600593.XSHG    8636
600562.XSHG    8496
600513.XSHG    8409
600571.XSHG    8249
dtype: float64
922
time 2012-09-01
code
600136.XSHG    9255
600485.XSHG    8874
600733.XSHG    8834
600749.XSHG    8725
600520.XSHG    8476
dtype: float64
933
time 2012-10-01
code
600136.XSHG    9251
600485.XSHG    8875
600733.XSHG    8824
600749.XSHG    8732
600758.XSHG    8475
dtype: float64
933
time 2012-11-01
code
600634.XSHG    9496
600733.XSHG    8811
600365.XSHG    8663
600758.XSHG    8474
600647.XSHG    8473
dtype: float64
940
time 2012-12-01
code
600634.XSHG    9494
600733.XSHG    8859
600365.XSHG    8682
600647.XSHG    8520
600758.XSHG    8480
dtype: float64
940
time 2013-01-01
code
600634.XSHG    9494
600733.XSHG    8849
600365.XSHG    8678
600647.XSHG    8525
600758.XSHG    8484
dtype: float64
940
time 2013-02-01
code
600634.XSHG    9480
600733.XSHG    8821
600647.XSHG    8538
600758.XSHG    8493
600980.XSHG    8458
dtype: float64
940
time 2013-03-01
code
600634.XSHG    9482
600733.XSHG    8832
600647.XSHG    8548
600758.XSHG    8504
600599.XSHG    8498
dtype: float64
942
time 2013-04-01
code
600634.XSHG    9396
600613.XSHG    8620
600985.XSHG    8602
600599.XSHG    8492
600647.XSHG    8442
dtype: float64
942
time 2013-05-01
code
600634.XSHG    9449
600136.XSHG    8910
600980.XSHG    8731
600985.XSHG    8607
600599.XSHG    8545
dtype: float64
942
time 2013-06-01
code
600485.XSHG    9022
600136.XSHG    8892
600980.XSHG    8726
600576.XSHG    8345
600706.XSHG    8332
dtype: float64
941
time 2013-07-01
code
600485.XSHG    9032
600136.XSHG    8902
600980.XSHG    8712
600706.XSHG    8331
600576.XSHG    8318
dtype: float64
941
time 2013-08-01
code
600485.XSHG    9037
600980.XSHG    8705
600576.XSHG    8343
600706.XSHG    8313
600379.XSHG    8302
dtype: float64
941
time 2013-09-01
code
600365.XSHG    8997
600485.XSHG    8938
600980.XSHG    8832
600615.XSHG    8649
600593.XSHG    8545
dtype: float64
941
time 2013-10-01
code
600365.XSHG    8983
600485.XSHG    8922
600980.XSHG    8826
600615.XSHG    8655
600234.XSHG    8566
dtype: float64
941
time 2013-11-01
code
600733.XSHG    8684
600485.XSHG    8457
600758.XSHG    8422
600099.XSHG    8401
600520.XSHG    8390
dtype: float64
941
time 2013-12-01
code
600733.XSHG    8723
600758.XSHG    8423
600520.XSHG    8402
600099.XSHG    8397
600146.XSHG    8356
dtype: float64
941
time 2014-01-01
code
600733.XSHG    8666
600485.XSHG    8421
600758.XSHG    8417
600520.XSHG    8400
600099.XSHG    8391
dtype: float64
941
time 2014-02-01
code
600733.XSHG    8702
600758.XSHG    8421
600146.XSHG    8411
600520.XSHG    8403
600099.XSHG    8393
dtype: float64
941
time 2014-03-01
code
600733.XSHG    8683
600485.XSHG    8460
600758.XSHG    8424
600520.XSHG    8422
600146.XSHG    8392
dtype: float64
941
time 2014-04-01
code
600146.XSHG    8422
600781.XSHG    8411
600506.XSHG    8409
600576.XSHG    8357
600485.XSHG    8354
dtype: float64
944
time 2014-05-01
code
600539.XSHG    9141
600980.XSHG    9020
600753.XSHG    8852
600593.XSHG    8846
600355.XSHG    8760
dtype: float64
948
time 2014-06-01
code
600539.XSHG    9140
600980.XSHG    9039
600753.XSHG    8873
600593.XSHG    8854
600355.XSHG    8765
dtype: float64
948
time 2014-07-01
code
600539.XSHG    9115
600980.XSHG    9006
600753.XSHG    8899
600593.XSHG    8853
600355.XSHG    8729
dtype: float64
947
time 2014-08-01
code
600539.XSHG    9151
600980.XSHG    8984
600593.XSHG    8846
600576.XSHG    8844
600753.XSHG    8838
dtype: float64
947
time 2014-09-01
code
600365.XSHG    8977
600099.XSHG    8765
600355.XSHG    8750
600847.XSHG    8742
600539.XSHG    8677
dtype: float64
951
time 2014-10-01
code
600365.XSHG    8988
600355.XSHG    8806
600099.XSHG    8776
600847.XSHG    8773
600476.XSHG    8696
dtype: float64
951
time 2014-11-01
code
600599.XSHG    9072
600696.XSHG    8995
600419.XSHG    8905
600136.XSHG    8883
600539.XSHG    8838
dtype: float64
968
time 2014-12-01
code
600696.XSHG    9009
600599.XSHG    8950
600419.XSHG    8910
600136.XSHG    8875
600539.XSHG    8836
dtype: float64
969
time 2015-01-01
code
600696.XSHG    9094
600599.XSHG    9039
600136.XSHG    8901
600419.XSHG    8895
600539.XSHG    8755
dtype: float64
969
time 2015-02-01
code
600696.XSHG    9076
600599.XSHG    8999
600419.XSHG    8902
600136.XSHG    8895
600539.XSHG    8756
dtype: float64
969
time 2015-03-01
code
600696.XSHG    9078
600599.XSHG    9007
600419.XSHG    8906
600539.XSHG    8785
600892.XSHG    8737
dtype: float64
969
time 2015-04-01
code
600696.XSHG    9142
600099.XSHG    8952
603601.XSHG    8946
600539.XSHG    8857
600599.XSHG    8817
dtype: float64
982
time 2015-05-01
code
603869.XSHG    9587
603088.XSHG    9461
600455.XSHG    9348
603898.XSHG    9339
603988.XSHG    9335
dtype: float64
1020
time 2015-06-01
code
603869.XSHG    9577
603088.XSHG    9544
603988.XSHG    9415
600455.XSHG    9412
600365.XSHG    9389
dtype: float64
1030
time 2015-07-01
code
603869.XSHG    9757
603088.XSHG    9632
603988.XSHG    9517
600455.XSHG    9494
603636.XSHG    9465
dtype: float64
1039
time 2015-08-01
code
603869.XSHG    9701
603988.XSHG    9515
600365.XSHG    9356
603010.XSHG    9319
600136.XSHG    9305
dtype: float64
1041
time 2015-09-01
code
600506.XSHG    9835
603099.XSHG    9546
600520.XSHG    9501
600593.XSHG    9441
600136.XSHG    9397
dtype: float64
1060
time 2015-10-01
code
600506.XSHG    9834
603099.XSHG    9563
600520.XSHG    9541
600593.XSHG    9476
600365.XSHG    9389
dtype: float64
1060
time 2015-11-01
code
603918.XSHG    9637
600980.XSHG    9520
600599.XSHG    9420
603601.XSHG    9391
600371.XSHG    9374
dtype: float64
1060
time 2015-12-01
code
600980.XSHG    9522
600753.XSHG    9475
603918.XSHG    9472
603010.XSHG    9364
600599.XSHG    9322
dtype: float64
1060
time 2016-01-01
code
603918.XSHG    9641
600980.XSHG    9549
600753.XSHG    9509
600599.XSHG    9438
603601.XSHG    9389
dtype: float64
1066
time 2016-02-01
code
603918.XSHG    9725
603778.XSHG    9652
600599.XSHG    9615
600980.XSHG    9538
603085.XSHG    9419
dtype: float64
1071
time 2016-03-01
code
603918.XSHG    9743
603778.XSHG    9706
600599.XSHG    9683
600980.XSHG    9576
600419.XSHG    9429
dtype: float64
1073
time 2016-04-01
code
600599.XSHG    9913
600419.XSHG    9801
603778.XSHG    9739
600080.XSHG    9710
603918.XSHG    9669
dtype: float64
1078
time 2016-05-01
code
603601.XSHG    9916
603918.XSHG    9907
600137.XSHG    9836
600733.XSHG    9693
603023.XSHG    9673
dtype: float64
1080
time 2016-06-01
code
600137.XSHG    9964
600733.XSHG    9869
603601.XSHG    9766
600506.XSHG    9756
603023.XSHG    9724
dtype: float64
1088
time 2016-07-01
code
600137.XSHG    10035
600733.XSHG     9957
600506.XSHG     9864
603601.XSHG     9716
603066.XSHG     9699
dtype: float64
1096
time 2016-08-01
code
600137.XSHG    10049
603322.XSHG     9969
603601.XSHG     9892
600506.XSHG     9862
600733.XSHG     9801
dtype: float64
1100
time 2016-09-01
code
600455.XSHG    10155
600980.XSHG     9933
603088.XSHG     9885
603027.XSHG     9881
603838.XSHG     9849
dtype: float64
1114
time 2016-10-01
code
600455.XSHG    10177
600980.XSHG    10053
603027.XSHG     9976
603088.XSHG     9970
603779.XSHG     9969
dtype: float64
1123
time 2016-11-01
code
603859.XSHG    10604
600817.XSHG    10441
603779.XSHG    10403
603189.XSHG    10400
600385.XSHG    10387
dtype: float64
1130
time 2016-12-01
code
603859.XSHG    10599
600817.XSHG    10443
603189.XSHG    10410
603779.XSHG    10400
600385.XSHG    10391
dtype: float64
1130
time 2017-01-01
code
603859.XSHG    10554
603189.XSHG    10521
600817.XSHG    10451
600385.XSHG    10372
603518.XSHG    10326
dtype: float64
1130
time 2017-02-01
code
603189.XSHG    10618
603859.XSHG    10489
600817.XSHG    10474
600385.XSHG    10409
603779.XSHG    10399
dtype: float64
1131
time 2017-03-01
code
603189.XSHG    10638
600817.XSHG    10488
603859.XSHG    10467
603779.XSHG    10438
600385.XSHG    10420
dtype: float64
1131
time 2017-04-01
code
603189.XSHG    10792
603779.XSHG    10609
600385.XSHG    10587
603022.XSHG    10441
603088.XSHG    10438
dtype: float64
1152
time 2017-05-01
code
603088.XSHG    11346
603903.XSHG    11275
603960.XSHG    11187
603040.XSHG    11168
603319.XSHG    11143
dtype: float64
1240
time 2017-06-01
code
603088.XSHG    11410
603040.XSHG    11337
603903.XSHG    11331
603960.XSHG    11255
603966.XSHG    11254
dtype: float64
1245
time 2017-07-01
code
603088.XSHG    11429
603903.XSHG    11410
603040.XSHG    11369
603966.XSHG    11275
603960.XSHG    11264
dtype: float64
1246
time 2017-08-01
code
603903.XSHG    11545
603088.XSHG    11454
603040.XSHG    11379
603960.XSHG    11310
603966.XSHG    11286
dtype: float64
1248
time 2017-09-01
code
603040.XSHG    11983
600455.XSHG    11890
603326.XSHG    11672
603429.XSHG    11576
603229.XSHG    11490
dtype: float64
1309
time 2017-10-01
code
603040.XSHG    12019
600455.XSHG    11897
603326.XSHG    11673
600506.XSHG    11525
603229.XSHG    11497
dtype: float64
1309
time 2017-11-01
code
603960.XSHG    12511
603232.XSHG    12503
603859.XSHG    12377
603383.XSHG    12297
603500.XSHG    12238
dtype: float64
1352
time 2017-12-01
code
603232.XSHG    12533
603960.XSHG    12437
603859.XSHG    12353
603500.XSHG    12288
603040.XSHG    12275
dtype: float64
1352

In [19]:

df = pd.Panel(result)

绘制六个组合的月超额收益率

In [20]:

matplotlib.rcParams['axes.unicode_minus']=False
index = ['Top20','port1','port2','port3','port4','port5']
def draw_backtest_picture(ind):
    plt.figure(figsize =(10,4))
    plt.plot(df.ix[:,ind,0]-df.ix[:,'benchmark',0], label = 'excess return: %s'%ind)
    plt.xlabel('backtest excess return of factor %s'%ind)
    plt.legend(loc=0)
    grid()
    
for ind in index:
    draw_backtest_picture(ind)
    

猜你喜欢

转载自blog.csdn.net/pigeontang/article/details/85470412