按另一列分组并提取 Pandas 中一列的最高值

Question

我想从以下数据框中为每个 type 获取多个自定义百分位数 (0.10, 0.20, ..., 0.50) 值：

    type    value
0   a   90
1   a   63
2   a   11
3   a   61
4   a   78
5   a   67
6   a   89
7   a   12
8   a   43
9   a   30
10  b   72
11  b   84
12  b   74
13  b   66
14  b   80
15  b   75
16  b   47
17  b   22
18  b   5
19  b   64
20  b   32
21  b   45

我已经进行了这一步：

df['percentile_rank'] = df.groupby('type').value.rank(pct=True).round(2)

输出：

    type    value   percentile_rank
0   a   90  1.00
1   a   63  0.60
2   a   11  0.10
3   a   61  0.50
4   a   78  0.80
5   a   67  0.70
6   a   89  0.90
7   a   12  0.20
8   a   43  0.40
9   a   30  0.30
10  b   72  0.67
11  b   84  1.00
12  b   74  0.75
13  b   66  0.58
14  b   80  0.92
15  b   75  0.83
16  b   47  0.42
17  b   22  0.17
18  b   5   0.08
19  b   64  0.50
20  b   32  0.25
21  b   45  0.33

但我不知道如何获得这样的预期结果：

    type    top10   top20   top30   top40   top50
0     a       89      78      67      63      61
1     b       80      75      72      66      64

上面table中的top10代表percentile_rank等于0.90，top20代表0.80等。如果没有确切的百分位数，然后我们取最接近的值，例如，top10 for type of b，我使用 80 的值，其 percentile_rank 是 0.92.

提前感谢您的帮助。

更新：

Andy L. 方法的输出，您可以注意到 NaNs for top55 and top45:

   type  top95  top90  top85  top80  top75  top70  top65  top60  top55  top50  \
0   e  40.82  41.81  41.82  42.35  43.85  44.42  44.99  45.92    NaN  45.94   

   top45  top40  top35  top30  top25  top20  top15  top10   top5  
0    NaN  46.04  46.25  46.45  46.85  47.49  48.55  49.82  52.18

YOBEN_S 方法的输出：

   type   top95  top90  top85   top80  top75  top70  top65  top60  top55  top50  \
0   e   40.704  41.82  41.82  42.326   43.7  44.36  44.94  45.94  45.94  45.94   

   top45  top40   top35  top30  top25   top20   top15   top10    top5  
0  45.94  46.04  46.226  46.42  46.82  47.412  48.412  49.776  52.008

Answer 1

我们可以quantile

s=df.groupby('type').value.apply(lambda x : x.quantile([0.9,.8,.7,.6,.5])).unstack()
Out[64]: 
       0.9   0.8   0.7   0.6   0.5
type                              
a     89.1  80.2  70.3  64.6  62.0
b     79.5  74.8  73.4  69.6  65.0

Answer 2

如果你想使用现有的值，我想你可以使用 pd.cut 和 groupby max 如下

bins = [0, 0.5, 0.6, 0.7, 0.8, 0.99]
labels = ['top50', 'top40', 'top30', 'top20', 'top10']
s = pd.cut(df.percentile_rank, bins=bins, labels=labels, right=True)
df_out = df.groupby(['type', s]).value.max().unstack()

Out[57]:
percentile_rank  top50  top40  top30  top20  top10
type
a                   61     63     67     78     89
b                   64     66     72     74     80

按另一列分组并提取 Pandas 中一列的最高值

Group by another column and extract top values of one column in Pandas

python

rank

percentile

dataframe

pandas