python_pandas DAY_20 (1) time resampling

Learn
time resampling
Tips:
interspersed with two small knowledge
closed: divide according to the interval, left the closed left and right will be designated as open interval; right will be divided into left and right opening and closing section. In general, it closed for the right time, the interval will be left when a ratio of more than. Interval division is completed, the polymerization operation is executed within this range.
label: range partitioning is completed, depending on the index section of the label is different. If the label for the left, the left side of the interval as the date of the index; if the label is right, the right of the date range as an index.
Focus on
the use of 1.Series-resample

import pandas as pd
import numpy as np
s=pd.Series(np.random.randint(0,50,60),
index=pd.date_range("2020-01-28 9:30",periods=60,freq="T"))
#创建60个以分钟为单位的数据,认为股票的每分钟成交量
print(s.resample("5min").sum())
#5分钟为基本单位,每五分钟求和,默认索引在最开始 的时间点
2020-01-28 09:30:00    136
2020-01-28 09:35:00    158
2020-01-28 09:40:00     82
2020-01-28 09:45:00    114
2020-01-28 09:50:00     93
2020-01-28 09:55:00     89
2020-01-28 10:00:00    118
2020-01-28 10:05:00    194
2020-01-28 10:10:00    172
2020-01-28 10:15:00    134
2020-01-28 10:20:00    123
2020-01-28 10:25:00     36


print(s.resample("5min",label="right").sum())
#左开右闭区间,索引为最晚时间点
2020-01-28 09:35:00    114
2020-01-28 09:40:00     97
2020-01-28 09:45:00    110
2020-01-28 09:50:00     85
2020-01-28 09:55:00     91
2020-01-28 10:00:00     83
2020-01-28 10:05:00     47
2020-01-28 10:10:00    117
2020-01-28 10:15:00    149
2020-01-28 10:20:00     97
2020-01-28 10:25:00     99
2020-01-28 10:30:00    150
Freq: 5T, dtype: int32


#把上述数据默认为股票的实时价格,引入ohlc
print(s.resample("5min").ohlc())
                     open  high  low  close
2020-01-28 09:30:00    39    48   15     15
2020-01-28 09:35:00    28    40   24     24
2020-01-28 09:40:00     1    32    1     32
2020-01-28 09:45:00    28    48   13     24
2020-01-28 09:50:00    12    38   10     18
2020-01-28 09:55:00    48    48   35     39
2020-01-28 10:00:00     8    35    8     35
2020-01-28 10:05:00    28    43   15     21
2020-01-28 10:10:00    24    38   24     38
2020-01-28 10:15:00    48    48    2     40
2020-01-28 10:20:00    17    48    0     48
2020-01-28 10:25:00    43    43   15     39

2. a monthly basis, data integration Series

import pandas as pd
import numpy as np

s = pd.Series(np.random.randint(0, 50, 60), index=pd.date_range("2020-01-28", periods=60, freq="D"))

print(s.groupby(lambda x: x.month).sum())#使用函数
1     81
2    867
3    645
dtype: int32
print(s.groupby(s.index.to_period('M')).sum())
#将日期索引转化成以月为周期,并成为分组依据,将内容求和
2020-01     81
2020-02    867
2020-03    645
Freq: M, dtype: int32
#注意两者的区别

Use of 3.DateFrame-resamp

df = pd.DataFrame(np.random.randint(10, 50, 40).reshape(10, 4),
                  index=pd.date_range("2020-01-28", periods=10, freq="M"),
                  columns=list("ABCD"))
                  #创建40个数据,以月为单位
print(df.resample("10D").sum())#以10天为采样周期,求其和
             A   B   C   D
2020-01-31  14  30  38  44
2020-02-10   0   0   0   0
2020-02-20  38  25  30  20
2020-03-01   0   0   0   0
2020-03-11   0   0   0   0
2020-03-21   0   0   0   0
2020-03-31  38  30  16  32
2020-04-10   0   0   0   0
2020-04-20   0   0   0   0
2020-04-30  33  27  33  15
2020-05-10   0   0   0   0
2020-05-20   0   0   0   0
2020-05-30  45  19  12  29
2020-06-09   0   0   0   0
2020-06-19   0   0   0   0
2020-06-29  26  11  49  43
2020-07-09   0   0   0   0
2020-07-19   0   0   0   0
2020-07-29  34  45  25  47
2020-08-08   0   0   0   0
2020-08-18   0   0   0   0
2020-08-28  37  14  18  44
2020-09-07   0   0   0   0
2020-09-17   0   0   0   0
2020-09-27  46  15  42  30
2020-10-07   0   0   0   0
2020-10-17   0   0   0   0
2020-10-27  17  11  18  24


df = pd.DataFrame(np.random.randint(10, 50, 20).reshape(5, 4),
                  index=pd.date_range("2020-01-28", periods=5, freq="M"),
                  columns=list("ABCD"))
print(df.resample("A-DEC").ffill())#采样以年为单位,12月为截至
             A   B   C   D
2020-12-31  38  31  17  17
print(df.resample("A-MAR").ffill())#一年为单位,3月截至
             A   B   C   D
2020-03-31  14  41  45  43
2021-03-31  43  42  42  49

Sampling 4.csv file

import pandas as pd
import numpy as np


df=pd.read_csv('002001.csv')
wdf=df["Adj Close"].resample("W-FRI").ohlc()
print(wdf)
#这样操作为报错,虽然csv文件含有日期,但是读取后python不支持,这时候要加上解析时间的代码
df=pd.read_csv('002001.csv',index_col="Date",parse_dates=True)
#这样就可以操作了,python就会识别日期
#源数据有Date索引,这里指定为列索引,自动解析
             open   high    low  close
Date                                  
2015-10-02  13.41  13.41  13.41  13.41
2015-10-09  13.41  14.75  13.41  14.62
2015-10-16  15.30  15.30  14.73  15.25
2015-10-23  15.03  15.22  14.26  15.20
2015-10-30  15.18  15.30  15.02  15.22
2015-11-06  14.74  15.86  14.62  15.86
2015-11-13  16.02  16.59  15.95  15.95
2015-11-20  16.21  16.22  15.75  16.08
2015-11-27  16.05  16.94  15.54  15.54
2015-12-04  15.70  16.62  15.70  16.62
2015-12-11  16.63  16.63  15.56  15.62
2015-12-18  16.06  16.60  16.06  16.31
2015-12-25  16.85  16.95  16.85  16.95


#在此基础上就可以进行一些列操作了
wdf["Volume"]=df['Volume'].resample('W-FRI').sum()
#加上交易量
             open   high    low  close    Volume
Date                                            
2015-10-02  13.41  13.41  13.41  13.41         0
2015-10-09  13.41  14.75  13.41  14.62  42135700
2015-10-16  15.30  15.30  14.73  15.25  73234300
2015-10-23  15.03  15.22  14.26  15.20  69848500
2015-10-30  15.18  15.30  15.02  15.22  64253700
2015-11-06  14.74  15.86  14.62  15.86  67429500
2015-11-13  16.02  16.59  15.95  15.95  87379300
2015-11-20  16.21  16.22  15.75  16.08  41097000
2015-11-27  16.05  16.94  15.54  15.54  64976300
2015-12-04  15.70  16.62  15.70  16.62  53552100
2015-12-11  16.63  16.63  15.56  15.62  42382600
2015-12-18  16.06  16.60  16.06  16.31  45879500
2015-12-25  16.85  16.95  16.85  16.95  27652100
Published 41 original articles · won praise 1 · views 928

Guess you like

Origin blog.csdn.net/soulproficiency/article/details/104098229