Python 使用依赖于另一列值的复杂函数聚合时间序列

Question

我的时间序列是这样的：

TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice
1,09:25:00,137.69,200,200,453,B,182023,137.69,241939,137.69
2,09:25:00,137.69,253,300,453,S,184857,137.69,241939,137.69
3,09:25:00,137.69,47,300,200,B,184857,137.69,241322,137.69
4,09:25:00,137.69,153,200,200,B,219208,137.69,241322,137.69

我可以通过对所有 Volume 求和来进行聚合

res = df.resample('t').agg({'Volume': 'sum'})

但我想根据 volume 和 type 列聚合 volume 和 type 列，当 type 是 S 然后添加卷，否则删除卷。如果聚合后的总体积为负，则类型为 S 否则类型为 B.

在上面的例子中，我把音量加起来后，总音量就变成了

200 - 253 + 300 + 200 = 447

并且类型是 B 因为 447 > 0

结果：

Time,Volume,Type
09:25:00,447,B

Answer 1

最简单的方法是将音量乘以 1 或 -1，具体取决于类型 map 中的值。然后 assign 列类型取决于总体积的结果。

res = (
    (df['Volume']*df['Type'].map({'S':-1, 'B':1}))
      .groupby(df['Time']).sum()#here should work with resample, 
                                #just your input is not the right format to use resample
      .reset_index(name='Volume')
      .assign(Type=lambda x: np.where(x['Volume']>0, 'B', 'S'))
)

print(res)
       Time  Volume Type
0  09:25:00     147    B # you used 2 columns to calculate your result volume 447?

Python 使用依赖于另一列值的复杂函数聚合时间序列

Python aggregate time series using a complex function that depends on the value from anther column

python

aggregation

pandas