无法重新采样 pandas 时间序列从 1 分钟到 5 分钟的数据

Question

我有一个 1 分钟间隔的盘中股票数据，如下所示：

import yfinance as yf
import pandas as pd
n = yf.download('^nsei', period= '5d', interval= '1m')

我正在尝试将其重新采样为“5m”数据，如下所示：

n = n.resample('5T').agg(dict(zip(n.columns, ['first', 'max', 'min', 'last', 'last', 'sum'])))

但它试图重新采样不在我的数据中的日期时间信息。市场数据仅在 03:30 PM 之前可用，但当我查看重新采样的数据框时，我发现它尝试重新采样整个 24 小时。
如何在 03:30PM 之前停止重采样并继续到下一个日期？
由于这个原因，现在数据框主要有 NaN 值。欢迎任何建议。

Answer 1

我不确定你想用 agg() 函数实现什么。假设 'first' 指的是第一个分位数，'last' 指的是最后一个分位数，并且您想计算每列的一些统计信息，我建议您执行以下操作：

获取您的数据：

import yfinance as yf
import pandas as pd
n = yf.download('^nsei', period= '5d', interval= '1m')

重新采样您的数据：

Note: your result is the same as when you resample with n.resample('5T').first() but this means every value in the dataframe equals the first value from the 5 minute interval consisting of 5 values. A more logical resampling method is to use the mean() or sum() function as shown below.

如果这是关于股票价格的数据，使用 mean():

更有意义

resampled_df = n.resample('5T').mean()

要删除工作库存时间之外的重采样时间，您有 2 个选项。

选项 1： 删除 na 值：

filtered_df = resampled_df.dropna()

Note: this will not work if you use sum() since the result won't contain missing values but zeros.

选项 2 基于开始和结束时间的过滤器

获取一天中数据可用的最短和最长时间作为 datetime.time 对象：

start = n.index.min().time() # 09:15 as datetime.time object
end = n.index.max().time() # 15:29 as datetime.time object

根据开始和结束时间过滤数据帧：

filtered_df = resampled_df.between_time(start, end)

获取统计信息：

statistics = filtered_df.describe()
statistics

请注意 describe() 不会包含总和，因此要添加总和，您可以这样做：

statistics = pd.concat([statistics, filtered_df.agg(['sum'])])
statistics

输出：

Answer 2

agg() 是对每一列应用单独的操作方法，我使用它是为了看到 'candlestick' 形态，因为它在股票技术分析中被称为。
我能够通过删除 NaN 值来解决问题。

无法重新采样 pandas 时间序列从 1 分钟到 5 分钟的数据

Trouble resampling pandas timeseries from 1min to 5min data

python

datetime

pandas

pandas-resample