Pandas:使用 itertuples 时如何计算滚动指标(均值、标准差、z 分数等)?
Pandas: How can you calculate rolling metrics (mean, standard deviation, z score, etc.) when using itertuples?
我正在尝试计算金融时间序列数据的滚动指标。我想使用循环方法来模拟实时数据测试。
在 itertuples 循环中计算这些滚动指标的最有效方法是什么?
示例数据:
DateTime Bid
2006-01-03 00:01:07.588 0.85208
2006-01-03 00:01:08.654 0.85213
2006-01-03 00:01:08.859 0.85212
2006-01-03 00:01:11.472 0.85215
2006-01-03 00:01:12.002 0.85218
... ...
2020-03-15 23:59:57.150 0.85178
2020-03-15 23:59:57.300 0.85179
2020-03-15 23:59:58.233 0.85179
2020-03-15 23:59:58.366 0.85178
2020-03-15 23:59:58.595 0.85179
我目前拥有的代码。
df = pd.read_hdf(r"F:\Market Data20.3.15 FXAUDCAD-TICK-NoSession.h5")
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df = df.drop(columns=['DateTime'])
Rolling_Metric = []
for row in df.itertuples():
?
我认为pandas不适合这个问题
如果你有一个像 api 这样的数据生成源,这里由一个生成器模拟
import numpy as np
def gen():
while True:
yield np.random.rand((np.random.randint(1,10)))
输出不同大小的数据数组
for i in islice(gen(), 4):
print(i)
输出
[0.1591485]
[0.40462191 0.32921298 0.64704824 0.9433797 0.44754502 0.47600713
0.66130654]
[0.45582976 0.37764161 0.47205139 0.32354448 0.06795233 0.47943393
0.13395702]
[0.0967848]
您可以计算滚动测量,例如 window 10 个样本
import time
from itertools import islice
data = np.array([])
for new_data in islice(gen(), 5): # get data
for elem in new_data: # iterate through new data
data = np.concatenate((data, [elem])) # add new data row by row
print(data[-10:].mean()) # get mean of last 10 observations
time.sleep(.5)
输出
0.8251054981003462
0.5154331864262989
0.5677470477572374
0.6084844147856047
0.6532425615231122
0.6663683916931894
0.6768810511903373
0.6098697771903554
0.5976415974047367
0.5442112622703545
0.556721858529291
0.5851107975154073
0.6129548571751687
0.5519507890295304
0.47809901125252807
0.457599927037135
0.47739535574047764
0.5135494376774083
0.5620825459637069
0.5914086396034781
0.5554789093102113
0.6042456773490161
0.5860524867501515
0.6218627945520632
0.6509948271807725
0.6693775700674035
0.6657165569407465
0.6825455302579173
0.609296884720923
0.6708821735456445
我正在尝试计算金融时间序列数据的滚动指标。我想使用循环方法来模拟实时数据测试。
在 itertuples 循环中计算这些滚动指标的最有效方法是什么?
示例数据:
DateTime Bid
2006-01-03 00:01:07.588 0.85208
2006-01-03 00:01:08.654 0.85213
2006-01-03 00:01:08.859 0.85212
2006-01-03 00:01:11.472 0.85215
2006-01-03 00:01:12.002 0.85218
... ...
2020-03-15 23:59:57.150 0.85178
2020-03-15 23:59:57.300 0.85179
2020-03-15 23:59:58.233 0.85179
2020-03-15 23:59:58.366 0.85178
2020-03-15 23:59:58.595 0.85179
我目前拥有的代码。
df = pd.read_hdf(r"F:\Market Data20.3.15 FXAUDCAD-TICK-NoSession.h5")
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df = df.drop(columns=['DateTime'])
Rolling_Metric = []
for row in df.itertuples():
?
我认为pandas不适合这个问题
如果你有一个像 api 这样的数据生成源,这里由一个生成器模拟
import numpy as np
def gen():
while True:
yield np.random.rand((np.random.randint(1,10)))
输出不同大小的数据数组
for i in islice(gen(), 4):
print(i)
输出
[0.1591485]
[0.40462191 0.32921298 0.64704824 0.9433797 0.44754502 0.47600713
0.66130654]
[0.45582976 0.37764161 0.47205139 0.32354448 0.06795233 0.47943393
0.13395702]
[0.0967848]
您可以计算滚动测量,例如 window 10 个样本
import time
from itertools import islice
data = np.array([])
for new_data in islice(gen(), 5): # get data
for elem in new_data: # iterate through new data
data = np.concatenate((data, [elem])) # add new data row by row
print(data[-10:].mean()) # get mean of last 10 observations
time.sleep(.5)
输出
0.8251054981003462
0.5154331864262989
0.5677470477572374
0.6084844147856047
0.6532425615231122
0.6663683916931894
0.6768810511903373
0.6098697771903554
0.5976415974047367
0.5442112622703545
0.556721858529291
0.5851107975154073
0.6129548571751687
0.5519507890295304
0.47809901125252807
0.457599927037135
0.47739535574047764
0.5135494376774083
0.5620825459637069
0.5914086396034781
0.5554789093102113
0.6042456773490161
0.5860524867501515
0.6218627945520632
0.6509948271807725
0.6693775700674035
0.6657165569407465
0.6825455302579173
0.609296884720923
0.6708821735456445