如何阻止 pandas dataframe.resample('T') 自动向数据帧添加额外索引？

Question

我正在尝试将具有逐分钟数据的数据帧下采样到 5 分钟的 bin 中。这是我当前的代码：

df = pd.read_csv('stockPrices/closingPrices-apr3.csv',index_col='date',parse_dates=True)
df['close'] = df['close'].shift()
df5min = df.resample('5T').last()
print(df5min.tail())

csv 文件的 link 在这里： https://drive.google.com/file/d/1uvkUaJwrQNsmte5IQIsJ_g5GS8RjVd8B/view?usp=sharing

输出应该在 2019-04-03 14:40:00 停止，因为最后一个值是 14:48:00，并且从 14:45-14:49 开始的 5 分钟 bin 是不可能的。但是，我得到了我的 csv 文件中不存在的以下日期时间索引值：

2019-04-03 14:45:00  286.35
2019-04-03 14:50:00  286.52
2019-04-03 14:55:00  286.32
2019-04-03 15:00:00  286.45
2019-04-03 15:05:00  280.64

到目前为止我能找到的唯一修复方法是使用以下代码，但是我前几天的所有数据都在 14:40 处被切断：

df5min = df.resample('5T').last().between_time(start_time='9:30',end_time='14:40')

感谢您对此提供任何帮助。

Answer 1

该解决方案将生成您可能不希望在 2018 年 4 月 3 日出现的行 15:05

df = pd.read_csv('./closingPrices-apr3.csv', index_col='date',parse_dates=True)
df.sort_index(inplace = True)
df = df.shift(5)
df_5min = df.resample('5T').first()

如何阻止 pandas dataframe.resample('T') 自动向数据帧添加额外索引？

How do I stop pandas dataframe.resample('T') from automatically adding extra indexes to dataframe?

python

dataframe

pandas

datetimeindex