pandas - 计算组内子组的百分比变化
pandas - calculation of percent change for a sub-group within a group
我有一个要求,需要计算订单组的百分比变化。如果主要组中子组的行数相同,我到目前为止所做的工作效果很好。我也需要考虑数量。
time txn_type symbol qty price
27/12/21 10:32 BUY XYZ 1 4054.5
27/12/21 10:26 SELL XYZ 2 4053.65
27/12/21 10:00 BUY XYZ 1 4072.25
27/12/21 09:56 BUY XYZ 1 4045.15
27/12/21 09:50 SELL XYZ 1 4034.25
27/12/21 09:40 BUY XYZ 1 4006
27/12/21 09:20 SELL XYZ 1 3978.1
27/12/21 10:55 SELL MNO 1 1714.95
27/12/21 10:25 BUY PQR 1 768.7
27/12/21 10:05 SELL PQR 1 765.05
27/12/21 09:57 SELL PQR 1 764
27/12/21 09:40 BUY PQR 1 769
27/12/21 09:28 SELL PQR 1 765.8
27/12/21 09:20 BUY PQR 1 768.95
27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
'symbol': 'first',
'txn_type': 'first',
'price': np.sum
})
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
symbol_percent_df.reset_index(inplace=True)
symbol_percent_df = symbol_percent_df[symbol_percent_df['txn_type'] == "BUY"]
symbol_percent_df.sort_values(by=['price'], ascending=False, inplace=True)
symbol_pct_dict: dict = symbol_percent_df.set_index('symbol')['price'].to_dict()
以上代码适用于 MNO、PQR,但对于 XYZ 给出的结果不正确,因为在 10:26 处一行的数量不同。
我需要的是字典中符号明智的百分比变化。
解决方案是按 qty
重复行,因此在此处获取双 1
行:
order_df = order_df.loc[order_df.index.repeat(order_df['qty'])]
print (order_df)
time txn_type symbol qty price
0 27/12/21 10:32 BUY XYZ 1 4054.50
1 27/12/21 10:26 SELL XYZ 2 4053.65
1 27/12/21 10:26 SELL XYZ 2 4053.65
2 27/12/21 10:00 BUY XYZ 1 4072.25
3 27/12/21 09:56 BUY XYZ 1 4045.15
4 27/12/21 09:50 SELL XYZ 1 4034.25
5 27/12/21 09:40 BUY XYZ 1 4006.00
6 27/12/21 09:20 SELL XYZ 1 3978.10
7 27/12/21 10:55 SELL MNO 1 1714.95
8 27/12/21 10:25 BUY PQR 1 768.70
9 27/12/21 10:05 SELL PQR 1 765.05
10 27/12/21 09:57 SELL PQR 1 764.00
11 27/12/21 09:40 BUY PQR 1 769.00
12 27/12/21 09:28 SELL PQR 1 765.80
13 27/12/21 09:20 BUY PQR 1 768.95
14 27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
'symbol': 'first',
'txn_type': 'first',
'price': np.sum
})
print (symbol_orders_df)
symbol txn_type price
symbol txn_type
MNO BUY MNO BUY 1703.55
SELL MNO SELL 1714.95
PQR BUY PQR BUY 2306.65
SELL PQR SELL 2294.85
XYZ BUY XYZ BUY 16177.90
SELL XYZ SELL 16119.65
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
print (symbol_percent_df)
price
symbol txn_type
MNO BUY 0.67
SELL NaN
PQR BUY -0.51
SELL NaN
XYZ BUY -0.36
SELL NaN
另一个想法是多列:
order_df['price'] = order_df['price'].mul(order_df['qty'])
print (order_df)
time txn_type symbol qty price
0 27/12/21 10:32 BUY XYZ 1 4054.50
1 27/12/21 10:26 SELL XYZ 2 8107.30
2 27/12/21 10:00 BUY XYZ 1 4072.25
3 27/12/21 09:56 BUY XYZ 1 4045.15
4 27/12/21 09:50 SELL XYZ 1 4034.25
5 27/12/21 09:40 BUY XYZ 1 4006.00
6 27/12/21 09:20 SELL XYZ 1 3978.10
7 27/12/21 10:55 SELL MNO 1 1714.95
8 27/12/21 10:25 BUY PQR 1 768.70
9 27/12/21 10:05 SELL PQR 1 765.05
10 27/12/21 09:57 SELL PQR 1 764.00
11 27/12/21 09:40 BUY PQR 1 769.00
12 27/12/21 09:28 SELL PQR 1 765.80
13 27/12/21 09:20 BUY PQR 1 768.95
14 27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type'])['price'].sum()
print (symbol_orders_df)
symbol txn_type
MNO BUY 1703.55
SELL 1714.95
PQR BUY 2306.65
SELL 2294.85
XYZ BUY 16177.90
SELL 16119.65
Name: price, dtype: float64
symbol_percent_df = (symbol_orders_df.groupby(level=[0]).shift(-1)
.sub(symbol_orders_df)
.div(symbol_orders_df)
.mul(100)
.round(2)
.to_frame())
print (symbol_percent_df)
price
symbol txn_type
MNO BUY 0.67
SELL NaN
PQR BUY -0.51
SELL NaN
XYZ BUY -0.36
SELL NaN
我有一个要求,需要计算订单组的百分比变化。如果主要组中子组的行数相同,我到目前为止所做的工作效果很好。我也需要考虑数量。
time txn_type symbol qty price
27/12/21 10:32 BUY XYZ 1 4054.5
27/12/21 10:26 SELL XYZ 2 4053.65
27/12/21 10:00 BUY XYZ 1 4072.25
27/12/21 09:56 BUY XYZ 1 4045.15
27/12/21 09:50 SELL XYZ 1 4034.25
27/12/21 09:40 BUY XYZ 1 4006
27/12/21 09:20 SELL XYZ 1 3978.1
27/12/21 10:55 SELL MNO 1 1714.95
27/12/21 10:25 BUY PQR 1 768.7
27/12/21 10:05 SELL PQR 1 765.05
27/12/21 09:57 SELL PQR 1 764
27/12/21 09:40 BUY PQR 1 769
27/12/21 09:28 SELL PQR 1 765.8
27/12/21 09:20 BUY PQR 1 768.95
27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
'symbol': 'first',
'txn_type': 'first',
'price': np.sum
})
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
symbol_percent_df.reset_index(inplace=True)
symbol_percent_df = symbol_percent_df[symbol_percent_df['txn_type'] == "BUY"]
symbol_percent_df.sort_values(by=['price'], ascending=False, inplace=True)
symbol_pct_dict: dict = symbol_percent_df.set_index('symbol')['price'].to_dict()
以上代码适用于 MNO、PQR,但对于 XYZ 给出的结果不正确,因为在 10:26 处一行的数量不同。
我需要的是字典中符号明智的百分比变化。
解决方案是按 qty
重复行,因此在此处获取双 1
行:
order_df = order_df.loc[order_df.index.repeat(order_df['qty'])]
print (order_df)
time txn_type symbol qty price
0 27/12/21 10:32 BUY XYZ 1 4054.50
1 27/12/21 10:26 SELL XYZ 2 4053.65
1 27/12/21 10:26 SELL XYZ 2 4053.65
2 27/12/21 10:00 BUY XYZ 1 4072.25
3 27/12/21 09:56 BUY XYZ 1 4045.15
4 27/12/21 09:50 SELL XYZ 1 4034.25
5 27/12/21 09:40 BUY XYZ 1 4006.00
6 27/12/21 09:20 SELL XYZ 1 3978.10
7 27/12/21 10:55 SELL MNO 1 1714.95
8 27/12/21 10:25 BUY PQR 1 768.70
9 27/12/21 10:05 SELL PQR 1 765.05
10 27/12/21 09:57 SELL PQR 1 764.00
11 27/12/21 09:40 BUY PQR 1 769.00
12 27/12/21 09:28 SELL PQR 1 765.80
13 27/12/21 09:20 BUY PQR 1 768.95
14 27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
'symbol': 'first',
'txn_type': 'first',
'price': np.sum
})
print (symbol_orders_df)
symbol txn_type price
symbol txn_type
MNO BUY MNO BUY 1703.55
SELL MNO SELL 1714.95
PQR BUY PQR BUY 2306.65
SELL PQR SELL 2294.85
XYZ BUY XYZ BUY 16177.90
SELL XYZ SELL 16119.65
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
print (symbol_percent_df)
price
symbol txn_type
MNO BUY 0.67
SELL NaN
PQR BUY -0.51
SELL NaN
XYZ BUY -0.36
SELL NaN
另一个想法是多列:
order_df['price'] = order_df['price'].mul(order_df['qty'])
print (order_df)
time txn_type symbol qty price
0 27/12/21 10:32 BUY XYZ 1 4054.50
1 27/12/21 10:26 SELL XYZ 2 8107.30
2 27/12/21 10:00 BUY XYZ 1 4072.25
3 27/12/21 09:56 BUY XYZ 1 4045.15
4 27/12/21 09:50 SELL XYZ 1 4034.25
5 27/12/21 09:40 BUY XYZ 1 4006.00
6 27/12/21 09:20 SELL XYZ 1 3978.10
7 27/12/21 10:55 SELL MNO 1 1714.95
8 27/12/21 10:25 BUY PQR 1 768.70
9 27/12/21 10:05 SELL PQR 1 765.05
10 27/12/21 09:57 SELL PQR 1 764.00
11 27/12/21 09:40 BUY PQR 1 769.00
12 27/12/21 09:28 SELL PQR 1 765.80
13 27/12/21 09:20 BUY PQR 1 768.95
14 27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type'])['price'].sum()
print (symbol_orders_df)
symbol txn_type
MNO BUY 1703.55
SELL 1714.95
PQR BUY 2306.65
SELL 2294.85
XYZ BUY 16177.90
SELL 16119.65
Name: price, dtype: float64
symbol_percent_df = (symbol_orders_df.groupby(level=[0]).shift(-1)
.sub(symbol_orders_df)
.div(symbol_orders_df)
.mul(100)
.round(2)
.to_frame())
print (symbol_percent_df)
price
symbol txn_type
MNO BUY 0.67
SELL NaN
PQR BUY -0.51
SELL NaN
XYZ BUY -0.36
SELL NaN