如何使用pandas计算x天分钟数据的average/std值?

How to calculate average/std values of minute data of x days using pandas?

此代码的目的是在每天的每一分钟获取最近 3 天数据的平均值。

如果今天是 2016-01-03,我想知道包括今天在内的 09:30:00 的最后 3 天平均收盘价,那么 pseudo 公式将如下所示:

今天 09:30:00 的 3 天平均收盘价 = (2016-01-01 收盘价 09:30:00 + 2016-01-02 收盘价 09:30:00 + 2016-01-03 收盘价 09:30:00) / 3

我使用 pandas 设法计算了 x 天的分钟数据的 average/std 值。 下面的代码是我实现的。

import pandas as pd
import numpy as np

# date, time, close
data = [ 
    [20150101, 90100, 100],
    [20150101, 90200, 102],
    [20150101, 90300, 104],
    [20150101, 90400, 106],
    [20150101, 90500, 108],

    [20150102, 90100, 100],
    [20150102, 90200, 104],
    [20150102, 90300, 105],
    [20150102, 90400, 103],
    [20150102, 90500, 102],

    [20150103, 90100, 100],
    [20150103, 90200,  98],
    [20150103, 90300,  99],
    [20150103, 90400, 102],
    [20150103, 90500, 101],

    [20150104, 90100, 100],
    [20150104, 90200, 101],
    [20150104, 90300, 100],
    [20150104, 90400, 100],
    [20150104, 90500, 101],

    [20150105, 90100, 100],
    [20150105, 90200, 102],
    [20150105, 90300, 104],
    [20150105, 90400, 106],
    [20150105, 90500, 108],
]

df = pd.DataFrame(data, columns = ['date', 'time', 'close'])
df.set_index(['date', 'time'], inplace=True)

################################################################

df.groupby(level=0)
dateidx = sorted(list(set(date for (date, time) in df.index)))
timeidx = sorted(list(set(time for (date, time) in df.index)))
print(dateidx)
print(timeidx)

df['mean'] = np.nan
df['std'] = np.nan

print(df)

idx = len(timeidx)*2
for i in range(5-2):
    slice=df.loc[dateidx[i]:dateidx[i+2]]
    times = slice.groupby(level='time')
    means = times.mean()
    stds = times.std()
    print('[means]')
    print(means)

    for i in range(len(timeidx)):
        df['mean'].iloc[idx] = means['close'].iloc[i]
        df['std'].iloc[idx]  = stds['close'].iloc[i]
        idx = idx + 1

print(df)    

下面是最终结果。

                close        mean       std
date     time
20150101 90100    100         NaN       NaN
         90200    102         NaN       NaN
         90300    104         NaN       NaN
         90400    106         NaN       NaN
         90500    108         NaN       NaN
20150102 90100    100         NaN       NaN
         90200    104         NaN       NaN
         90300    105         NaN       NaN
         90400    103         NaN       NaN
         90500    102         NaN       NaN
20150103 90100    100  100.000000  0.000000
         90200     98  101.333333  3.055050
         90300     99  102.666667  3.214550
         90400    102  103.666667  2.081666
         90500    101  103.666667  3.785939
20150104 90100    100  100.000000  0.000000
         90200    101  101.000000  3.000000
         90300    100  101.333333  3.214550
         90400    100  101.666667  1.527525
         90500    101  101.333333  0.577350
20150105 90100    100  100.000000  0.000000
         90200    102  100.333333  2.081666
         90300    104  101.000000  2.645751
         90400    106  102.666667  3.055050
         90500    108  103.333333  4.041452

但问题是,上面的代码太慢了,而且有点复杂。 那么,有没有人建议这个问题的最佳代码或解决方案?

ps。我想删除那些用于计算要更改的行的位置的常量。请推荐更简洁优雅的方法。

我相信你会发现这不言自明。

df = pd.DataFrame(data, columns = ['date', 'time', 'close']).set_index(['date', 'time'])

df['mean'] = df.groupby(level='time')['close'].apply(lambda x: pd.rolling_mean(x, window=3))
df['std'] = df.groupby(level='time')['close'].apply(lambda x: pd.rolling_std(x, window=3))

>>> df
                close        mean       std
date     time                              
20150101 90100    100         NaN       NaN
         90200    102         NaN       NaN
         90300    104         NaN       NaN
         90400    106         NaN       NaN
         90500    108         NaN       NaN
20150102 90100    100         NaN       NaN
         90200    104         NaN       NaN
         90300    105         NaN       NaN
         90400    103         NaN       NaN
         90500    102         NaN       NaN
20150103 90100    100  100.000000  0.000000
         90200     98  101.333333  3.055050
         90300     99  102.666667  3.214550
         90400    102  103.666667  2.081666
         90500    101  103.666667  3.785939
20150104 90100    100  100.000000  0.000000
         90200    101  101.000000  3.000000
         90300    100  101.333333  3.214550
         90400    100  101.666667  1.527525
         90500    101  101.333333  0.577350
20150105 90100    100  100.000000  0.000000
         90200    102  100.333333  2.081666
         90300    104  101.000000  2.645751
         90400    106  102.666667  3.055050
         90500    108  103.333333  4.041452