Pandas

Question

我有这样的数据集：

id     type     score
a1     ball       15
a2     ball       12
a1     pencil     10
a3     ball       8
a2     pencil     6

我想找出每个 ID 的每个类型的排名。因为我稍后会将排名转换为百分位数，所以我更喜欢使用 rank.

输出应该是这样的：

id     type     score rank
a1     ball       15   1
a2     ball       12   2
a1     pencil     10   1
a3     ball       8    3
a2     pencil     6    2

到目前为止，我所做的是获取一组独特的 type 并以此迭代它：

test_data['percentile_from_all'] = 0
for i in unique_type_list:
    loc_i = test_data['type']==i
    percentiles = test_data.loc[loc_i,['score']].rank(pct = True)*100
    test_data.loc[loc_i,'percentile_from_all'] = percentiles.values

这种方法适用于小型数据集，但即使是 10k 次迭代，它也变得太慢了。有没有办法像 apply 那样同时进行？

谢谢！

Answer 1

检查groupby

df['rnk'] = df.groupby('type').score.rank(ascending=False)
Out[67]: 
0    1.0
1    2.0
2    1.0
3    3.0
4    2.0
Name: score, dtype: float64

Pandas - 优化百分位数计算

Pandas - optimize percentile calculation

python

rank