如何在 pandas 中有效地在组内循环？

Question

我有一个table这样的

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({'date':[1,2,3,4,5,6,7,8,9,10] ,'high':[10,9,8,8,7,6,7,8,9,10],'low':[9,7,6,5,2,1,2,1,8,9],'stock':['A']*5 + ['B']*5})

date	high	low	stock
1	10	9	A
2	9	7	A
3	8	6	A
4	8	5	A
5	7	2	A
6	6	1	B
7	7	2	B
8	8	1	B
9	9	8	B
10	10	9	B

对于每只股票的每一天，我想知道今天的“最高价”和最低价（之后或今天）之间的最大差异是多少。例如，在日期 1，股票 A 的最高价是 10 美元。我查看日期 1-5，发现高低之间的最大差异出现在日期 5。日期 1 的结果将是 10-2=8。在日期 2，我应该只查看日期 2 之后的“低”。

结果：

date	high	low	stock	diff_high_low
1	10	9	A	8
2	9	7	A	7
3	8	6	A	6
4	8	5	A	6
5	7	2	A	5
6	6	1	B	5
7	7	2	B	6
8	8	1	B	7
9	9	8	B	1
10	10	9	B	1

我目前正在使用 for 循环并且它有效。它在我的 100 万多行 table 上真的很慢。有更好的方法吗？

我目前的方法：

diff_high_low=[]
for gname, g in df.groupby('stock'):
    rows = g.shape[0]
    for i in range(0,rows):
            diff_high_low.append(max( g['high'].iloc[i] - g['low'].iloc[i:rows,]))
df['diff_high_low'] = diff_high_low

Answer 1

我们需要 groupby 和 cummin

df['diff_high_low'] = df['high'] - df.iloc[::-1].groupby('stock')['low'].cummin()
Out[273]: 
0    8
1    7
2    6
3    6
4    5
5    5
6    6
7    7
8    1
9    1
dtype: int64

如何在 pandas 中有效地在组内循环？

How to effectively loop within groups in pandas?

python

pandas

rolling-computation