如何使用pandas计算x天分钟数据的average/std值?
How to calculate average/std values of minute data of x days using pandas?
此代码的目的是在每天的每一分钟获取最近 3 天数据的平均值。
如果今天是 2016-01-03,我想知道包括今天在内的 09:30:00 的最后 3 天平均收盘价,那么 pseudo 公式将如下所示:
今天 09:30:00 的 3 天平均收盘价 =
(2016-01-01 收盘价 09:30:00 + 2016-01-02 收盘价 09:30:00 + 2016-01-03 收盘价 09:30:00) / 3
我使用 pandas 设法计算了 x 天的分钟数据的 average/std 值。
下面的代码是我实现的。
import pandas as pd
import numpy as np
# date, time, close
data = [
[20150101, 90100, 100],
[20150101, 90200, 102],
[20150101, 90300, 104],
[20150101, 90400, 106],
[20150101, 90500, 108],
[20150102, 90100, 100],
[20150102, 90200, 104],
[20150102, 90300, 105],
[20150102, 90400, 103],
[20150102, 90500, 102],
[20150103, 90100, 100],
[20150103, 90200, 98],
[20150103, 90300, 99],
[20150103, 90400, 102],
[20150103, 90500, 101],
[20150104, 90100, 100],
[20150104, 90200, 101],
[20150104, 90300, 100],
[20150104, 90400, 100],
[20150104, 90500, 101],
[20150105, 90100, 100],
[20150105, 90200, 102],
[20150105, 90300, 104],
[20150105, 90400, 106],
[20150105, 90500, 108],
]
df = pd.DataFrame(data, columns = ['date', 'time', 'close'])
df.set_index(['date', 'time'], inplace=True)
################################################################
df.groupby(level=0)
dateidx = sorted(list(set(date for (date, time) in df.index)))
timeidx = sorted(list(set(time for (date, time) in df.index)))
print(dateidx)
print(timeidx)
df['mean'] = np.nan
df['std'] = np.nan
print(df)
idx = len(timeidx)*2
for i in range(5-2):
slice=df.loc[dateidx[i]:dateidx[i+2]]
times = slice.groupby(level='time')
means = times.mean()
stds = times.std()
print('[means]')
print(means)
for i in range(len(timeidx)):
df['mean'].iloc[idx] = means['close'].iloc[i]
df['std'].iloc[idx] = stds['close'].iloc[i]
idx = idx + 1
print(df)
下面是最终结果。
close mean std
date time
20150101 90100 100 NaN NaN
90200 102 NaN NaN
90300 104 NaN NaN
90400 106 NaN NaN
90500 108 NaN NaN
20150102 90100 100 NaN NaN
90200 104 NaN NaN
90300 105 NaN NaN
90400 103 NaN NaN
90500 102 NaN NaN
20150103 90100 100 100.000000 0.000000
90200 98 101.333333 3.055050
90300 99 102.666667 3.214550
90400 102 103.666667 2.081666
90500 101 103.666667 3.785939
20150104 90100 100 100.000000 0.000000
90200 101 101.000000 3.000000
90300 100 101.333333 3.214550
90400 100 101.666667 1.527525
90500 101 101.333333 0.577350
20150105 90100 100 100.000000 0.000000
90200 102 100.333333 2.081666
90300 104 101.000000 2.645751
90400 106 102.666667 3.055050
90500 108 103.333333 4.041452
但问题是,上面的代码太慢了,而且有点复杂。
那么,有没有人建议这个问题的最佳代码或解决方案?
ps。我想删除那些用于计算要更改的行的位置的常量。请推荐更简洁优雅的方法。
我相信你会发现这不言自明。
df = pd.DataFrame(data, columns = ['date', 'time', 'close']).set_index(['date', 'time'])
df['mean'] = df.groupby(level='time')['close'].apply(lambda x: pd.rolling_mean(x, window=3))
df['std'] = df.groupby(level='time')['close'].apply(lambda x: pd.rolling_std(x, window=3))
>>> df
close mean std
date time
20150101 90100 100 NaN NaN
90200 102 NaN NaN
90300 104 NaN NaN
90400 106 NaN NaN
90500 108 NaN NaN
20150102 90100 100 NaN NaN
90200 104 NaN NaN
90300 105 NaN NaN
90400 103 NaN NaN
90500 102 NaN NaN
20150103 90100 100 100.000000 0.000000
90200 98 101.333333 3.055050
90300 99 102.666667 3.214550
90400 102 103.666667 2.081666
90500 101 103.666667 3.785939
20150104 90100 100 100.000000 0.000000
90200 101 101.000000 3.000000
90300 100 101.333333 3.214550
90400 100 101.666667 1.527525
90500 101 101.333333 0.577350
20150105 90100 100 100.000000 0.000000
90200 102 100.333333 2.081666
90300 104 101.000000 2.645751
90400 106 102.666667 3.055050
90500 108 103.333333 4.041452
此代码的目的是在每天的每一分钟获取最近 3 天数据的平均值。
如果今天是 2016-01-03,我想知道包括今天在内的 09:30:00 的最后 3 天平均收盘价,那么 pseudo 公式将如下所示:
今天 09:30:00 的 3 天平均收盘价 = (2016-01-01 收盘价 09:30:00 + 2016-01-02 收盘价 09:30:00 + 2016-01-03 收盘价 09:30:00) / 3
我使用 pandas 设法计算了 x 天的分钟数据的 average/std 值。 下面的代码是我实现的。
import pandas as pd
import numpy as np
# date, time, close
data = [
[20150101, 90100, 100],
[20150101, 90200, 102],
[20150101, 90300, 104],
[20150101, 90400, 106],
[20150101, 90500, 108],
[20150102, 90100, 100],
[20150102, 90200, 104],
[20150102, 90300, 105],
[20150102, 90400, 103],
[20150102, 90500, 102],
[20150103, 90100, 100],
[20150103, 90200, 98],
[20150103, 90300, 99],
[20150103, 90400, 102],
[20150103, 90500, 101],
[20150104, 90100, 100],
[20150104, 90200, 101],
[20150104, 90300, 100],
[20150104, 90400, 100],
[20150104, 90500, 101],
[20150105, 90100, 100],
[20150105, 90200, 102],
[20150105, 90300, 104],
[20150105, 90400, 106],
[20150105, 90500, 108],
]
df = pd.DataFrame(data, columns = ['date', 'time', 'close'])
df.set_index(['date', 'time'], inplace=True)
################################################################
df.groupby(level=0)
dateidx = sorted(list(set(date for (date, time) in df.index)))
timeidx = sorted(list(set(time for (date, time) in df.index)))
print(dateidx)
print(timeidx)
df['mean'] = np.nan
df['std'] = np.nan
print(df)
idx = len(timeidx)*2
for i in range(5-2):
slice=df.loc[dateidx[i]:dateidx[i+2]]
times = slice.groupby(level='time')
means = times.mean()
stds = times.std()
print('[means]')
print(means)
for i in range(len(timeidx)):
df['mean'].iloc[idx] = means['close'].iloc[i]
df['std'].iloc[idx] = stds['close'].iloc[i]
idx = idx + 1
print(df)
下面是最终结果。
close mean std
date time
20150101 90100 100 NaN NaN
90200 102 NaN NaN
90300 104 NaN NaN
90400 106 NaN NaN
90500 108 NaN NaN
20150102 90100 100 NaN NaN
90200 104 NaN NaN
90300 105 NaN NaN
90400 103 NaN NaN
90500 102 NaN NaN
20150103 90100 100 100.000000 0.000000
90200 98 101.333333 3.055050
90300 99 102.666667 3.214550
90400 102 103.666667 2.081666
90500 101 103.666667 3.785939
20150104 90100 100 100.000000 0.000000
90200 101 101.000000 3.000000
90300 100 101.333333 3.214550
90400 100 101.666667 1.527525
90500 101 101.333333 0.577350
20150105 90100 100 100.000000 0.000000
90200 102 100.333333 2.081666
90300 104 101.000000 2.645751
90400 106 102.666667 3.055050
90500 108 103.333333 4.041452
但问题是,上面的代码太慢了,而且有点复杂。 那么,有没有人建议这个问题的最佳代码或解决方案?
ps。我想删除那些用于计算要更改的行的位置的常量。请推荐更简洁优雅的方法。
我相信你会发现这不言自明。
df = pd.DataFrame(data, columns = ['date', 'time', 'close']).set_index(['date', 'time'])
df['mean'] = df.groupby(level='time')['close'].apply(lambda x: pd.rolling_mean(x, window=3))
df['std'] = df.groupby(level='time')['close'].apply(lambda x: pd.rolling_std(x, window=3))
>>> df
close mean std
date time
20150101 90100 100 NaN NaN
90200 102 NaN NaN
90300 104 NaN NaN
90400 106 NaN NaN
90500 108 NaN NaN
20150102 90100 100 NaN NaN
90200 104 NaN NaN
90300 105 NaN NaN
90400 103 NaN NaN
90500 102 NaN NaN
20150103 90100 100 100.000000 0.000000
90200 98 101.333333 3.055050
90300 99 102.666667 3.214550
90400 102 103.666667 2.081666
90500 101 103.666667 3.785939
20150104 90100 100 100.000000 0.000000
90200 101 101.000000 3.000000
90300 100 101.333333 3.214550
90400 100 101.666667 1.527525
90500 101 101.333333 0.577350
20150105 90100 100 100.000000 0.000000
90200 102 100.333333 2.081666
90300 104 101.000000 2.645751
90400 106 102.666667 3.055050
90500 108 103.333333 4.041452