Python 使用依赖于另一列值的复杂函数聚合时间序列

Python aggregate time series using a complex function that depends on the value from anther column

我的时间序列是这样的:

TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice
1,09:25:00,137.69,200,200,453,B,182023,137.69,241939,137.69
2,09:25:00,137.69,253,300,453,S,184857,137.69,241939,137.69
3,09:25:00,137.69,47,300,200,B,184857,137.69,241322,137.69
4,09:25:00,137.69,153,200,200,B,219208,137.69,241322,137.69

我可以通过对所有 Volume 求和来进行聚合

res = df.resample('t').agg({'Volume': 'sum'})

但我想根据 volumetype 列聚合 volume 和 type 列,当 type 是 S 然后添加卷,否则删除卷。如果聚合后的总体积为负,则类型为 S 否则类型为 B.

在上面的例子中,我把音量加起来后,总音量就变成了

200 - 253 + 300 + 200 = 447

并且类型是 B 因为 447 > 0

结果:

Time,Volume,Type
09:25:00,447,B

最简单的方法是将音量乘以 1 或 -1,具体取决于类型 map 中的值。然后 assign 列类型取决于总体积的结果。

res = (
    (df['Volume']*df['Type'].map({'S':-1, 'B':1}))
      .groupby(df['Time']).sum()#here should work with resample, 
                                #just your input is not the right format to use resample
      .reset_index(name='Volume')
      .assign(Type=lambda x: np.where(x['Volume']>0, 'B', 'S'))
)

print(res)
       Time  Volume Type
0  09:25:00     147    B # you used 2 columns to calculate your result volume 447?