我们如何将 OHLCV 1 分钟 Pandas 数据帧重新采样为 5 分钟数据帧 - 2020 方法?
How can we resample OHLCV 1 minute Pandas Dataframe into a 5 minute Dataframe - 2020 method?
这个问题之前已经回答过很多次了。但是,无论我尝试什么,这些方法要么已被弃用,要么已完全改变。这就是为什么我想要基于下面列出的数据帧在 2020 年发挥作用的原因。
让我解释一下我需要什么。
我有一个数据帧 df1m,它是来自服务的 1m 数据。为了准确演示我需要什么,我还从同一服务中获得了 df5m。
但是,我想要一种方法将 df1m 重新采样到看起来与 df5m 完全一样的 Dataframe。
(a) 9:31 - 9:35 应重新采样为 9:35
(b) 5m 的收盘价 9:35 应取自 1m 的收盘价 9:35
(c) 15:56 - 16:00 应重新采样为 16:00
In [138]: df1m.head(15)
...:
Out[138]:
Open High Low Close Volume
date
2020-04-27 09:31:00 10.5300 10.5300 10.4100 10.4458 1408654.0
2020-04-27 09:32:00 10.4450 10.4450 10.3810 10.4100 469467.0
2020-04-27 09:33:00 10.4000 10.4100 10.3470 10.3766 305665.0
2020-04-27 09:34:00 10.3742 10.4000 10.3600 10.3850 127815.0
2020-04-27 09:35:00 10.3850 10.4000 10.3700 10.3714 125987.0
2020-04-27 09:36:00 10.3700 10.4100 10.3500 10.3500 248228.0
2020-04-27 09:37:00 10.3500 10.3850 10.3500 10.3570 130435.0
2020-04-27 09:38:00 10.3600 10.3600 10.3000 10.3000 250145.0
2020-04-27 09:39:00 10.3000 10.3299 10.2800 10.2999 277293.0
2020-04-27 09:40:00 10.2950 10.2950 10.2200 10.2200 333785.0
2020-04-27 09:41:00 10.2280 10.2300 10.1500 10.1550 292010.0
2020-04-27 09:42:00 10.1597 10.2100 10.1500 10.1900 314917.0
2020-04-27 09:43:00 10.1890 10.2180 10.1800 10.2114 293827.0
2020-04-27 09:44:00 10.2200 10.2500 10.1900 10.1902 317016.0
2020-04-27 09:45:00 10.1950 10.2100 10.1342 10.1396 296248.0
In [139]: df5m.head(5)
...:
Out[139]:
Open High Low Close Volume
date
2020-04-27 09:35:00 10.530 10.5300 10.3470 10.3714 2437589.0
2020-04-27 09:40:00 10.370 10.4100 10.2200 10.2200 1239889.0
2020-04-27 09:45:00 10.228 10.2500 10.1342 10.1396 1514020.0
2020-04-27 09:50:00 10.140 10.1578 10.0500 10.1300 1182617.0
2020-04-27 09:55:00 10.130 10.2400 10.1200 10.1400 1119197.0
In [136]: df1m.tail(15)
...:
Out[136]:
Open High Low Close Volume
date
2020-04-27 15:46:00 10.0250 10.0300 10.000 10.0099 547806.0
2020-04-27 15:47:00 10.0099 10.0200 10.000 10.0142 708078.0
2020-04-27 15:48:00 10.0150 10.0300 10.000 10.0277 267942.0
2020-04-27 15:49:00 10.0300 10.0500 10.020 10.0500 212731.0
2020-04-27 15:50:00 10.0470 10.0500 10.020 10.0250 358654.0
2020-04-27 15:51:00 10.0250 10.0300 10.000 10.0186 420574.0
2020-04-27 15:52:00 10.0200 10.0300 10.005 10.0050 281548.0
2020-04-27 15:53:00 10.0086 10.0186 10.000 10.0100 779115.0
2020-04-27 15:54:00 10.0086 10.0500 10.000 10.0486 404785.0
2020-04-27 15:55:00 10.0490 10.0600 10.040 10.0500 243380.0
2020-04-27 15:56:00 10.0500 10.0600 10.040 10.0500 219162.0
2020-04-27 15:57:00 10.0550 10.0700 10.050 10.0700 263262.0
2020-04-27 15:58:00 10.0600 10.0700 10.050 10.0600 345422.0
2020-04-27 15:59:00 10.0550 10.0600 10.040 10.0450 371237.0
2020-04-27 16:00:00 10.0500 10.0600 10.030 10.0300 566676.0
In [137]: df5m.tail(5)
...:
Out[137]:
Open High Low Close Volume
date
2020-04-27 15:40:00 10.115 10.12 10.070 10.0742 2216599.0
2020-04-27 15:45:00 10.075 10.08 10.015 10.0300 2231974.0
2020-04-27 15:50:00 10.025 10.05 10.000 10.0250 2095213.0
2020-04-27 15:55:00 10.025 10.06 10.000 10.0500 2129405.0
2020-04-27 16:00:00 10.050 10.07 10.020 10.0300 1765760.0
这是我从保存的 csv 文件创建 df1m 的方式。
ticker = 'AAL'
df = pd.read_csv('C:\Path\stock_intraday\{}.csv'.format(ticker))
df.set_index('date', inplace=True)
df.index = pd.to_datetime(df.index)
df.index.names = ['Datetime']
df.sort_index( axis=0, ascending=True, inplace=True)
df1m = df['2020-04-27':'2020-04-27']
csv数据如下,我只贴了head 20和tail 20 for 4/27/2020
date,Open,High,Low,Close,Volume
2020-04-27 16:00:00,10.05,10.06,10.03,10.03,566676.0
2020-04-27 15:59:00,10.055,10.06,10.04,10.045,371237.0
2020-04-27 15:58:00,10.06,10.07,10.05,10.06,345422.0
2020-04-27 15:57:00,10.055,10.07,10.05,10.07,263262.0
2020-04-27 15:56:00,10.05,10.06,10.04,10.05,219162.0
2020-04-27 15:55:00,10.049,10.06,10.04,10.05,243380.0
2020-04-27 15:54:00,10.0086,10.05,10.0,10.0486,404785.0
2020-04-27 15:53:00,10.0086,10.0186,10.0,10.01,779115.0
2020-04-27 15:52:00,10.02,10.03,10.005,10.005,281548.0
2020-04-27 15:51:00,10.025,10.03,10.0,10.0186,420574.0
2020-04-27 15:50:00,10.047,10.05,10.02,10.025,358654.0
2020-04-27 15:49:00,10.03,10.05,10.02,10.05,212731.0
2020-04-27 15:48:00,10.015,10.03,10.0,10.0277,267942.0
2020-04-27 15:47:00,10.0099,10.02,10.0,10.0142,708078.0
2020-04-27 15:46:00,10.025,10.03,10.0,10.0099,547806.0
2020-04-27 15:45:00,10.045,10.05,10.015,10.03,400360.0
2020-04-27 15:44:00,10.055,10.07,10.03,10.05,395272.0
2020-04-27 15:43:00,10.055,10.07,10.04,10.06,451599.0
2020-04-27 15:42:00,10.0545,10.06,10.04,10.0528,442260.0
2020-04-27 15:41:00,10.075,10.08,10.05,10.055,542481.0
...
2020-04-27 09:50:00,10.12,10.14,10.12,10.13,162016.0
2020-04-27 09:49:00,10.13,10.1578,10.12,10.13,188149.0
2020-04-27 09:48:00,10.1324,10.1499,10.095,10.12,179250.0
2020-04-27 09:47:00,10.05,10.135,10.05,10.13,347080.0
2020-04-27 09:46:00,10.14,10.14,10.05,10.06,306120.0
2020-04-27 09:45:00,10.195,10.21,10.1342,10.1396,296248.0
2020-04-27 09:44:00,10.22,10.25,10.19,10.1902,317016.0
2020-04-27 09:43:00,10.189,10.218,10.18,10.2114,293827.0
2020-04-27 09:42:00,10.1597,10.21,10.15,10.19,314917.0
2020-04-27 09:41:00,10.228,10.23,10.15,10.155,292010.0
2020-04-27 09:40:00,10.295,10.295,10.22,10.22,333785.0
2020-04-27 09:39:00,10.3,10.3299,10.28,10.2999,277293.0
2020-04-27 09:38:00,10.36,10.36,10.3,10.3,250145.0
2020-04-27 09:37:00,10.35,10.385,10.35,10.357000000000001,130435.0
2020-04-27 09:36:00,10.37,10.41,10.35,10.35,248228.0
2020-04-27 09:35:00,10.385,10.4,10.37,10.3714,125987.0
2020-04-27 09:34:00,10.3742,10.4,10.36,10.385,127815.0
2020-04-27 09:33:00,10.4,10.41,10.347000000000001,10.3766,305665.0
2020-04-27 09:32:00,10.445,10.445,10.380999999999998,10.41,469467.0
2020-04-27 09:31:00,10.53,10.53,10.41,10.4458,1408654.0
您可以根据需要使用字典指定不同的感冒鸡蛋。
# I just copied your top table here to make the df
df = pd.read_clipboard(sep=r"[ ]{2,}")
df = df.set_index(pd.DatetimeIndex(df['date']))
print(df)
date Open High Low Close Volume
date
2020-04-27 09:31:00 2020-04-27 09:31:00 10.5300 10.5300 10.4100 10.4458 1408654.0
2020-04-27 09:32:00 2020-04-27 09:32:00 10.4450 10.4450 10.3810 10.4100 469467.0
2020-04-27 09:33:00 2020-04-27 09:33:00 10.4000 10.4100 10.3470 10.3766 305665.0
2020-04-27 09:34:00 2020-04-27 09:34:00 10.3742 10.4000 10.3600 10.3850 127815.0
2020-04-27 09:35:00 2020-04-27 09:35:00 10.3850 10.4000 10.3700 10.3714 125987.0
2020-04-27 09:36:00 2020-04-27 09:36:00 10.3700 10.4100 10.3500 10.3500 248228.0
2020-04-27 09:37:00 2020-04-27 09:37:00 10.3500 10.3850 10.3500 10.3570 130435.0
2020-04-27 09:38:00 2020-04-27 09:38:00 10.3600 10.3600 10.3000 10.3000 250145.0
2020-04-27 09:39:00 2020-04-27 09:39:00 10.3000 10.3299 10.2800 10.2999 277293.0
2020-04-27 09:40:00 2020-04-27 09:40:00 10.2950 10.2950 10.2200 10.2200 333785.0
2020-04-27 09:41:00 2020-04-27 09:41:00 10.2280 10.2300 10.1500 10.1550 292010.0
2020-04-27 09:42:00 2020-04-27 09:42:00 10.1597 10.2100 10.1500 10.1900 314917.0
2020-04-27 09:43:00 2020-04-27 09:43:00 10.1890 10.2180 10.1800 10.2114 293827.0
2020-04-27 09:44:00 2020-04-27 09:44:00 10.2200 10.2500 10.1900 10.1902 317016.0
2020-04-27 09:45:00 2020-04-27 09:45:00 10.1950 10.2100 10.1342 10.1396 296248.0
df_rs = df.resample('5T', label='right', closed='right').agg({'Open':'first',
'High':'max',
'Low':'min',
'Close':'last',
'Volume':'sum'})
print(df_rs)
Open High Low Close Volume
date
2020-04-27 09:35:00 10.530 10.53 10.3470 10.3714 2437588.0
2020-04-27 09:40:00 10.370 10.41 10.2200 10.2200 1239886.0
2020-04-27 09:45:00 10.228 10.25 10.1342 10.1396 1514018.0
这个问题之前已经回答过很多次了。但是,无论我尝试什么,这些方法要么已被弃用,要么已完全改变。这就是为什么我想要基于下面列出的数据帧在 2020 年发挥作用的原因。
让我解释一下我需要什么。
我有一个数据帧 df1m,它是来自服务的 1m 数据。为了准确演示我需要什么,我还从同一服务中获得了 df5m。
但是,我想要一种方法将 df1m 重新采样到看起来与 df5m 完全一样的 Dataframe。
(a) 9:31 - 9:35 应重新采样为 9:35 (b) 5m 的收盘价 9:35 应取自 1m 的收盘价 9:35 (c) 15:56 - 16:00 应重新采样为 16:00
In [138]: df1m.head(15)
...:
Out[138]:
Open High Low Close Volume
date
2020-04-27 09:31:00 10.5300 10.5300 10.4100 10.4458 1408654.0
2020-04-27 09:32:00 10.4450 10.4450 10.3810 10.4100 469467.0
2020-04-27 09:33:00 10.4000 10.4100 10.3470 10.3766 305665.0
2020-04-27 09:34:00 10.3742 10.4000 10.3600 10.3850 127815.0
2020-04-27 09:35:00 10.3850 10.4000 10.3700 10.3714 125987.0
2020-04-27 09:36:00 10.3700 10.4100 10.3500 10.3500 248228.0
2020-04-27 09:37:00 10.3500 10.3850 10.3500 10.3570 130435.0
2020-04-27 09:38:00 10.3600 10.3600 10.3000 10.3000 250145.0
2020-04-27 09:39:00 10.3000 10.3299 10.2800 10.2999 277293.0
2020-04-27 09:40:00 10.2950 10.2950 10.2200 10.2200 333785.0
2020-04-27 09:41:00 10.2280 10.2300 10.1500 10.1550 292010.0
2020-04-27 09:42:00 10.1597 10.2100 10.1500 10.1900 314917.0
2020-04-27 09:43:00 10.1890 10.2180 10.1800 10.2114 293827.0
2020-04-27 09:44:00 10.2200 10.2500 10.1900 10.1902 317016.0
2020-04-27 09:45:00 10.1950 10.2100 10.1342 10.1396 296248.0
In [139]: df5m.head(5)
...:
Out[139]:
Open High Low Close Volume
date
2020-04-27 09:35:00 10.530 10.5300 10.3470 10.3714 2437589.0
2020-04-27 09:40:00 10.370 10.4100 10.2200 10.2200 1239889.0
2020-04-27 09:45:00 10.228 10.2500 10.1342 10.1396 1514020.0
2020-04-27 09:50:00 10.140 10.1578 10.0500 10.1300 1182617.0
2020-04-27 09:55:00 10.130 10.2400 10.1200 10.1400 1119197.0
In [136]: df1m.tail(15)
...:
Out[136]:
Open High Low Close Volume
date
2020-04-27 15:46:00 10.0250 10.0300 10.000 10.0099 547806.0
2020-04-27 15:47:00 10.0099 10.0200 10.000 10.0142 708078.0
2020-04-27 15:48:00 10.0150 10.0300 10.000 10.0277 267942.0
2020-04-27 15:49:00 10.0300 10.0500 10.020 10.0500 212731.0
2020-04-27 15:50:00 10.0470 10.0500 10.020 10.0250 358654.0
2020-04-27 15:51:00 10.0250 10.0300 10.000 10.0186 420574.0
2020-04-27 15:52:00 10.0200 10.0300 10.005 10.0050 281548.0
2020-04-27 15:53:00 10.0086 10.0186 10.000 10.0100 779115.0
2020-04-27 15:54:00 10.0086 10.0500 10.000 10.0486 404785.0
2020-04-27 15:55:00 10.0490 10.0600 10.040 10.0500 243380.0
2020-04-27 15:56:00 10.0500 10.0600 10.040 10.0500 219162.0
2020-04-27 15:57:00 10.0550 10.0700 10.050 10.0700 263262.0
2020-04-27 15:58:00 10.0600 10.0700 10.050 10.0600 345422.0
2020-04-27 15:59:00 10.0550 10.0600 10.040 10.0450 371237.0
2020-04-27 16:00:00 10.0500 10.0600 10.030 10.0300 566676.0
In [137]: df5m.tail(5)
...:
Out[137]:
Open High Low Close Volume
date
2020-04-27 15:40:00 10.115 10.12 10.070 10.0742 2216599.0
2020-04-27 15:45:00 10.075 10.08 10.015 10.0300 2231974.0
2020-04-27 15:50:00 10.025 10.05 10.000 10.0250 2095213.0
2020-04-27 15:55:00 10.025 10.06 10.000 10.0500 2129405.0
2020-04-27 16:00:00 10.050 10.07 10.020 10.0300 1765760.0
这是我从保存的 csv 文件创建 df1m 的方式。
ticker = 'AAL'
df = pd.read_csv('C:\Path\stock_intraday\{}.csv'.format(ticker))
df.set_index('date', inplace=True)
df.index = pd.to_datetime(df.index)
df.index.names = ['Datetime']
df.sort_index( axis=0, ascending=True, inplace=True)
df1m = df['2020-04-27':'2020-04-27']
csv数据如下,我只贴了head 20和tail 20 for 4/27/2020
date,Open,High,Low,Close,Volume
2020-04-27 16:00:00,10.05,10.06,10.03,10.03,566676.0
2020-04-27 15:59:00,10.055,10.06,10.04,10.045,371237.0
2020-04-27 15:58:00,10.06,10.07,10.05,10.06,345422.0
2020-04-27 15:57:00,10.055,10.07,10.05,10.07,263262.0
2020-04-27 15:56:00,10.05,10.06,10.04,10.05,219162.0
2020-04-27 15:55:00,10.049,10.06,10.04,10.05,243380.0
2020-04-27 15:54:00,10.0086,10.05,10.0,10.0486,404785.0
2020-04-27 15:53:00,10.0086,10.0186,10.0,10.01,779115.0
2020-04-27 15:52:00,10.02,10.03,10.005,10.005,281548.0
2020-04-27 15:51:00,10.025,10.03,10.0,10.0186,420574.0
2020-04-27 15:50:00,10.047,10.05,10.02,10.025,358654.0
2020-04-27 15:49:00,10.03,10.05,10.02,10.05,212731.0
2020-04-27 15:48:00,10.015,10.03,10.0,10.0277,267942.0
2020-04-27 15:47:00,10.0099,10.02,10.0,10.0142,708078.0
2020-04-27 15:46:00,10.025,10.03,10.0,10.0099,547806.0
2020-04-27 15:45:00,10.045,10.05,10.015,10.03,400360.0
2020-04-27 15:44:00,10.055,10.07,10.03,10.05,395272.0
2020-04-27 15:43:00,10.055,10.07,10.04,10.06,451599.0
2020-04-27 15:42:00,10.0545,10.06,10.04,10.0528,442260.0
2020-04-27 15:41:00,10.075,10.08,10.05,10.055,542481.0
...
2020-04-27 09:50:00,10.12,10.14,10.12,10.13,162016.0
2020-04-27 09:49:00,10.13,10.1578,10.12,10.13,188149.0
2020-04-27 09:48:00,10.1324,10.1499,10.095,10.12,179250.0
2020-04-27 09:47:00,10.05,10.135,10.05,10.13,347080.0
2020-04-27 09:46:00,10.14,10.14,10.05,10.06,306120.0
2020-04-27 09:45:00,10.195,10.21,10.1342,10.1396,296248.0
2020-04-27 09:44:00,10.22,10.25,10.19,10.1902,317016.0
2020-04-27 09:43:00,10.189,10.218,10.18,10.2114,293827.0
2020-04-27 09:42:00,10.1597,10.21,10.15,10.19,314917.0
2020-04-27 09:41:00,10.228,10.23,10.15,10.155,292010.0
2020-04-27 09:40:00,10.295,10.295,10.22,10.22,333785.0
2020-04-27 09:39:00,10.3,10.3299,10.28,10.2999,277293.0
2020-04-27 09:38:00,10.36,10.36,10.3,10.3,250145.0
2020-04-27 09:37:00,10.35,10.385,10.35,10.357000000000001,130435.0
2020-04-27 09:36:00,10.37,10.41,10.35,10.35,248228.0
2020-04-27 09:35:00,10.385,10.4,10.37,10.3714,125987.0
2020-04-27 09:34:00,10.3742,10.4,10.36,10.385,127815.0
2020-04-27 09:33:00,10.4,10.41,10.347000000000001,10.3766,305665.0
2020-04-27 09:32:00,10.445,10.445,10.380999999999998,10.41,469467.0
2020-04-27 09:31:00,10.53,10.53,10.41,10.4458,1408654.0
您可以根据需要使用字典指定不同的感冒鸡蛋。
# I just copied your top table here to make the df
df = pd.read_clipboard(sep=r"[ ]{2,}")
df = df.set_index(pd.DatetimeIndex(df['date']))
print(df)
date Open High Low Close Volume
date
2020-04-27 09:31:00 2020-04-27 09:31:00 10.5300 10.5300 10.4100 10.4458 1408654.0
2020-04-27 09:32:00 2020-04-27 09:32:00 10.4450 10.4450 10.3810 10.4100 469467.0
2020-04-27 09:33:00 2020-04-27 09:33:00 10.4000 10.4100 10.3470 10.3766 305665.0
2020-04-27 09:34:00 2020-04-27 09:34:00 10.3742 10.4000 10.3600 10.3850 127815.0
2020-04-27 09:35:00 2020-04-27 09:35:00 10.3850 10.4000 10.3700 10.3714 125987.0
2020-04-27 09:36:00 2020-04-27 09:36:00 10.3700 10.4100 10.3500 10.3500 248228.0
2020-04-27 09:37:00 2020-04-27 09:37:00 10.3500 10.3850 10.3500 10.3570 130435.0
2020-04-27 09:38:00 2020-04-27 09:38:00 10.3600 10.3600 10.3000 10.3000 250145.0
2020-04-27 09:39:00 2020-04-27 09:39:00 10.3000 10.3299 10.2800 10.2999 277293.0
2020-04-27 09:40:00 2020-04-27 09:40:00 10.2950 10.2950 10.2200 10.2200 333785.0
2020-04-27 09:41:00 2020-04-27 09:41:00 10.2280 10.2300 10.1500 10.1550 292010.0
2020-04-27 09:42:00 2020-04-27 09:42:00 10.1597 10.2100 10.1500 10.1900 314917.0
2020-04-27 09:43:00 2020-04-27 09:43:00 10.1890 10.2180 10.1800 10.2114 293827.0
2020-04-27 09:44:00 2020-04-27 09:44:00 10.2200 10.2500 10.1900 10.1902 317016.0
2020-04-27 09:45:00 2020-04-27 09:45:00 10.1950 10.2100 10.1342 10.1396 296248.0
df_rs = df.resample('5T', label='right', closed='right').agg({'Open':'first',
'High':'max',
'Low':'min',
'Close':'last',
'Volume':'sum'})
print(df_rs)
Open High Low Close Volume
date
2020-04-27 09:35:00 10.530 10.53 10.3470 10.3714 2437588.0
2020-04-27 09:40:00 10.370 10.41 10.2200 10.2200 1239886.0
2020-04-27 09:45:00 10.228 10.25 10.1342 10.1396 1514018.0