我们如何将 OHLCV 1 分钟 Pandas 数据帧重新采样为 5 分钟数据帧 - 2020 方法?

How can we resample OHLCV 1 minute Pandas Dataframe into a 5 minute Dataframe - 2020 method?

这个问题之前已经回答过很多次了。但是,无论我尝试什么,这些方法要么已被弃用,要么已完全改变。这就是为什么我想要基于下面列出的数据帧在 2020 年发挥作用的原因。

让我解释一下我需要什么。

我有一个数据帧 df1m,它是来自服务的 1m 数据。为了准确演示我需要什么,我还从同一服务中获得了 df5m。

但是,我想要一种方法将 df1m 重新采样到看起来与 df5m 完全一样的 Dataframe。

(a) 9:31 - 9:35 应重新采样为 9:35 (b) 5m 的收盘价 9:35 应取自 1m 的收盘价 9:35 (c) 15:56 - 16:00 应重新采样为 16:00

In [138]: df1m.head(15)
     ...: 
Out[138]: 
                        Open     High      Low    Close     Volume
date                                                              
2020-04-27 09:31:00  10.5300  10.5300  10.4100  10.4458  1408654.0
2020-04-27 09:32:00  10.4450  10.4450  10.3810  10.4100   469467.0
2020-04-27 09:33:00  10.4000  10.4100  10.3470  10.3766   305665.0
2020-04-27 09:34:00  10.3742  10.4000  10.3600  10.3850   127815.0
2020-04-27 09:35:00  10.3850  10.4000  10.3700  10.3714   125987.0

2020-04-27 09:36:00  10.3700  10.4100  10.3500  10.3500   248228.0
2020-04-27 09:37:00  10.3500  10.3850  10.3500  10.3570   130435.0
2020-04-27 09:38:00  10.3600  10.3600  10.3000  10.3000   250145.0
2020-04-27 09:39:00  10.3000  10.3299  10.2800  10.2999   277293.0
2020-04-27 09:40:00  10.2950  10.2950  10.2200  10.2200   333785.0

2020-04-27 09:41:00  10.2280  10.2300  10.1500  10.1550   292010.0
2020-04-27 09:42:00  10.1597  10.2100  10.1500  10.1900   314917.0
2020-04-27 09:43:00  10.1890  10.2180  10.1800  10.2114   293827.0
2020-04-27 09:44:00  10.2200  10.2500  10.1900  10.1902   317016.0
2020-04-27 09:45:00  10.1950  10.2100  10.1342  10.1396   296248.0


In [139]: df5m.head(5)
     ...: 
Out[139]: 
                       Open     High      Low    Close     Volume
date                                                             
2020-04-27 09:35:00  10.530  10.5300  10.3470  10.3714  2437589.0
2020-04-27 09:40:00  10.370  10.4100  10.2200  10.2200  1239889.0
2020-04-27 09:45:00  10.228  10.2500  10.1342  10.1396  1514020.0

2020-04-27 09:50:00  10.140  10.1578  10.0500  10.1300  1182617.0
2020-04-27 09:55:00  10.130  10.2400  10.1200  10.1400  1119197.0



In [136]: df1m.tail(15)
     ...: 
Out[136]: 
                        Open     High     Low    Close    Volume
date                                                            
2020-04-27 15:46:00  10.0250  10.0300  10.000  10.0099  547806.0
2020-04-27 15:47:00  10.0099  10.0200  10.000  10.0142  708078.0
2020-04-27 15:48:00  10.0150  10.0300  10.000  10.0277  267942.0
2020-04-27 15:49:00  10.0300  10.0500  10.020  10.0500  212731.0
2020-04-27 15:50:00  10.0470  10.0500  10.020  10.0250  358654.0

2020-04-27 15:51:00  10.0250  10.0300  10.000  10.0186  420574.0
2020-04-27 15:52:00  10.0200  10.0300  10.005  10.0050  281548.0
2020-04-27 15:53:00  10.0086  10.0186  10.000  10.0100  779115.0
2020-04-27 15:54:00  10.0086  10.0500  10.000  10.0486  404785.0
2020-04-27 15:55:00  10.0490  10.0600  10.040  10.0500  243380.0

2020-04-27 15:56:00  10.0500  10.0600  10.040  10.0500  219162.0
2020-04-27 15:57:00  10.0550  10.0700  10.050  10.0700  263262.0
2020-04-27 15:58:00  10.0600  10.0700  10.050  10.0600  345422.0
2020-04-27 15:59:00  10.0550  10.0600  10.040  10.0450  371237.0
2020-04-27 16:00:00  10.0500  10.0600  10.030  10.0300  566676.0


In [137]: df5m.tail(5)
     ...: 
Out[137]: 
                       Open   High     Low    Close     Volume
date                                                          
2020-04-27 15:40:00  10.115  10.12  10.070  10.0742  2216599.0
2020-04-27 15:45:00  10.075  10.08  10.015  10.0300  2231974.0
2020-04-27 15:50:00  10.025  10.05  10.000  10.0250  2095213.0

2020-04-27 15:55:00  10.025  10.06  10.000  10.0500  2129405.0
2020-04-27 16:00:00  10.050  10.07  10.020  10.0300  1765760.0

这是我从保存的 csv 文件创建 df1m 的方式。

ticker = 'AAL'
df = pd.read_csv('C:\Path\stock_intraday\{}.csv'.format(ticker))
df.set_index('date', inplace=True)
df.index = pd.to_datetime(df.index)
df.index.names = ['Datetime']
df.sort_index( axis=0, ascending=True, inplace=True)
df1m = df['2020-04-27':'2020-04-27']   

csv数据如下,我只贴了head 20和tail 20 for 4/27/2020

date,Open,High,Low,Close,Volume
2020-04-27 16:00:00,10.05,10.06,10.03,10.03,566676.0
2020-04-27 15:59:00,10.055,10.06,10.04,10.045,371237.0
2020-04-27 15:58:00,10.06,10.07,10.05,10.06,345422.0
2020-04-27 15:57:00,10.055,10.07,10.05,10.07,263262.0
2020-04-27 15:56:00,10.05,10.06,10.04,10.05,219162.0
2020-04-27 15:55:00,10.049,10.06,10.04,10.05,243380.0
2020-04-27 15:54:00,10.0086,10.05,10.0,10.0486,404785.0
2020-04-27 15:53:00,10.0086,10.0186,10.0,10.01,779115.0
2020-04-27 15:52:00,10.02,10.03,10.005,10.005,281548.0
2020-04-27 15:51:00,10.025,10.03,10.0,10.0186,420574.0
2020-04-27 15:50:00,10.047,10.05,10.02,10.025,358654.0
2020-04-27 15:49:00,10.03,10.05,10.02,10.05,212731.0
2020-04-27 15:48:00,10.015,10.03,10.0,10.0277,267942.0
2020-04-27 15:47:00,10.0099,10.02,10.0,10.0142,708078.0
2020-04-27 15:46:00,10.025,10.03,10.0,10.0099,547806.0
2020-04-27 15:45:00,10.045,10.05,10.015,10.03,400360.0
2020-04-27 15:44:00,10.055,10.07,10.03,10.05,395272.0
2020-04-27 15:43:00,10.055,10.07,10.04,10.06,451599.0
2020-04-27 15:42:00,10.0545,10.06,10.04,10.0528,442260.0
2020-04-27 15:41:00,10.075,10.08,10.05,10.055,542481.0

...

2020-04-27 09:50:00,10.12,10.14,10.12,10.13,162016.0
2020-04-27 09:49:00,10.13,10.1578,10.12,10.13,188149.0
2020-04-27 09:48:00,10.1324,10.1499,10.095,10.12,179250.0
2020-04-27 09:47:00,10.05,10.135,10.05,10.13,347080.0
2020-04-27 09:46:00,10.14,10.14,10.05,10.06,306120.0
2020-04-27 09:45:00,10.195,10.21,10.1342,10.1396,296248.0
2020-04-27 09:44:00,10.22,10.25,10.19,10.1902,317016.0
2020-04-27 09:43:00,10.189,10.218,10.18,10.2114,293827.0
2020-04-27 09:42:00,10.1597,10.21,10.15,10.19,314917.0
2020-04-27 09:41:00,10.228,10.23,10.15,10.155,292010.0
2020-04-27 09:40:00,10.295,10.295,10.22,10.22,333785.0
2020-04-27 09:39:00,10.3,10.3299,10.28,10.2999,277293.0
2020-04-27 09:38:00,10.36,10.36,10.3,10.3,250145.0
2020-04-27 09:37:00,10.35,10.385,10.35,10.357000000000001,130435.0
2020-04-27 09:36:00,10.37,10.41,10.35,10.35,248228.0
2020-04-27 09:35:00,10.385,10.4,10.37,10.3714,125987.0
2020-04-27 09:34:00,10.3742,10.4,10.36,10.385,127815.0
2020-04-27 09:33:00,10.4,10.41,10.347000000000001,10.3766,305665.0
2020-04-27 09:32:00,10.445,10.445,10.380999999999998,10.41,469467.0
2020-04-27 09:31:00,10.53,10.53,10.41,10.4458,1408654.0

您可以根据需要使用字典指定不同的感冒鸡蛋。

# I just copied your top table here to make the df
df = pd.read_clipboard(sep=r"[ ]{2,}")
df = df.set_index(pd.DatetimeIndex(df['date']))

print(df)

    date    Open    High    Low Close   Volume
date                        
2020-04-27 09:31:00 2020-04-27 09:31:00 10.5300 10.5300 10.4100 10.4458 1408654.0
2020-04-27 09:32:00 2020-04-27 09:32:00 10.4450 10.4450 10.3810 10.4100 469467.0
2020-04-27 09:33:00 2020-04-27 09:33:00 10.4000 10.4100 10.3470 10.3766 305665.0
2020-04-27 09:34:00 2020-04-27 09:34:00 10.3742 10.4000 10.3600 10.3850 127815.0
2020-04-27 09:35:00 2020-04-27 09:35:00 10.3850 10.4000 10.3700 10.3714 125987.0
2020-04-27 09:36:00 2020-04-27 09:36:00 10.3700 10.4100 10.3500 10.3500 248228.0
2020-04-27 09:37:00 2020-04-27 09:37:00 10.3500 10.3850 10.3500 10.3570 130435.0
2020-04-27 09:38:00 2020-04-27 09:38:00 10.3600 10.3600 10.3000 10.3000 250145.0
2020-04-27 09:39:00 2020-04-27 09:39:00 10.3000 10.3299 10.2800 10.2999 277293.0
2020-04-27 09:40:00 2020-04-27 09:40:00 10.2950 10.2950 10.2200 10.2200 333785.0
2020-04-27 09:41:00 2020-04-27 09:41:00 10.2280 10.2300 10.1500 10.1550 292010.0
2020-04-27 09:42:00 2020-04-27 09:42:00 10.1597 10.2100 10.1500 10.1900 314917.0
2020-04-27 09:43:00 2020-04-27 09:43:00 10.1890 10.2180 10.1800 10.2114 293827.0
2020-04-27 09:44:00 2020-04-27 09:44:00 10.2200 10.2500 10.1900 10.1902 317016.0
2020-04-27 09:45:00 2020-04-27 09:45:00 10.1950 10.2100 10.1342 10.1396 296248.0


df_rs = df.resample('5T', label='right', closed='right').agg({'Open':'first',
                                                                 'High':'max',
                                                                 'Low':'min',
                                                                 'Close':'last',
                                                                 'Volume':'sum'})

print(df_rs)

    Open    High    Low Close   Volume
date                    
2020-04-27 09:35:00 10.530  10.53   10.3470 10.3714 2437588.0
2020-04-27 09:40:00 10.370  10.41   10.2200 10.2200 1239886.0
2020-04-27 09:45:00 10.228  10.25   10.1342 10.1396 1514018.0