Pandas 求每小时滚动平均值
Pandas find hourly rolling average
我的数据集 df
看起来像这样。这是一个基于 minute
的数据集。
time, Open, High
2017-01-01 00:00:00, 1.2432, 1.1234
2017-01-01 00:01:00, 1.2432, 1.1234
2017-01-01 00:02:00, 1.2332, 1.1234
2017-01-01 00:03:00, 1.2132, 1.1234
...., ...., ....
2017-12-31 23:59:00, 1.2132, 1.1234
我想为 Open
列找到每小时 rolling mean
,但它应该是灵活的,这样我也可以为其他列找到每小时 rolling mean
。
我做了什么?
我能够找到如下所示的 daily rolling average
:
# Pandas code to find the rolling mean for a single day
df
.assign(1davg=df.rolling(window=1*24*60)['Open'].mean())
.groupby(df['time'].dt.date)
.last()
请注意,将此(window=1*24*60
更改为 window=60
)行代码不起作用,因为我已经尝试过了。
新的 output
应该是这样的:
time, Open, High, Open_hour_avg
2017-01-01 00:00:00, 1.2432, 1.1234, 1.2532
2017-01-01 01:00:00, 1.2432, 1.1234, 1.2632
2017-01-01 02:00:00, 1.2332, 1.1234, 1.2332
2017-01-01 03:00:00, 1.2132, 1.1234, 1.2432
...., ...., ...., ....
2017-12-31 23:00:00, 1.2132, 1.1234, 1.2232
这里,
2017-01-01 00:00:00, 1.2432, 1.1234, 1.2532
是 midnight
的 minute
平均数据
和 2017-01-01 01:00:00, 1.2432, 1.1234, 1.2632
是 1 AM
的 minute
平均数据
我们可以
df['open_ave_hour']=df.groupby(df.time.dt.strftime('%H:%M:%S')).Open.mean().reindex(df.time.dt.strftime('%H:%M:%S')).to_numpy()
或变换
df['open_ave_hour']=df.groupby(df.time.dt.strftime('%H:%M:%S')).Open.transform('mean')
我是这样工作的:
import pandas as pd
# After your CSV data is in a df
df['time'] = pd.to_datetime(df['time'])
df.index = df['time']
df_mean = df.resample('H').mean()
time, Open High
2017-01-01 00:00:00 1.051488 1.051500
2017-01-01 01:00:00 1.051247 1.051275
2017-01-01 02:00:00 1.051890 1.051957
2017-01-01 03:00:00 1.051225 1.051290
...., ...., ....
2017-12-31 23:00:00 1.051225 1.051290
我的数据集 df
看起来像这样。这是一个基于 minute
的数据集。
time, Open, High
2017-01-01 00:00:00, 1.2432, 1.1234
2017-01-01 00:01:00, 1.2432, 1.1234
2017-01-01 00:02:00, 1.2332, 1.1234
2017-01-01 00:03:00, 1.2132, 1.1234
...., ...., ....
2017-12-31 23:59:00, 1.2132, 1.1234
我想为 Open
列找到每小时 rolling mean
,但它应该是灵活的,这样我也可以为其他列找到每小时 rolling mean
。
我做了什么?
我能够找到如下所示的 daily rolling average
:
# Pandas code to find the rolling mean for a single day
df
.assign(1davg=df.rolling(window=1*24*60)['Open'].mean())
.groupby(df['time'].dt.date)
.last()
请注意,将此(window=1*24*60
更改为 window=60
)行代码不起作用,因为我已经尝试过了。
新的 output
应该是这样的:
time, Open, High, Open_hour_avg
2017-01-01 00:00:00, 1.2432, 1.1234, 1.2532
2017-01-01 01:00:00, 1.2432, 1.1234, 1.2632
2017-01-01 02:00:00, 1.2332, 1.1234, 1.2332
2017-01-01 03:00:00, 1.2132, 1.1234, 1.2432
...., ...., ...., ....
2017-12-31 23:00:00, 1.2132, 1.1234, 1.2232
这里,
2017-01-01 00:00:00, 1.2432, 1.1234, 1.2532
是 midnight
minute
平均数据
和 2017-01-01 01:00:00, 1.2432, 1.1234, 1.2632
是 1 AM
minute
平均数据
我们可以
df['open_ave_hour']=df.groupby(df.time.dt.strftime('%H:%M:%S')).Open.mean().reindex(df.time.dt.strftime('%H:%M:%S')).to_numpy()
或变换
df['open_ave_hour']=df.groupby(df.time.dt.strftime('%H:%M:%S')).Open.transform('mean')
我是这样工作的:
import pandas as pd
# After your CSV data is in a df
df['time'] = pd.to_datetime(df['time'])
df.index = df['time']
df_mean = df.resample('H').mean()
time, Open High
2017-01-01 00:00:00 1.051488 1.051500
2017-01-01 01:00:00 1.051247 1.051275
2017-01-01 02:00:00 1.051890 1.051957
2017-01-01 03:00:00 1.051225 1.051290
...., ...., ....
2017-12-31 23:00:00 1.051225 1.051290