python pandas 查找列中组的百分位数

python pandas find percentile for a group in column

我想找到每列的百分位数并添加到 df 数据框和标签

如果列的值为

前 20%(值>第 80 个百分位数)然后 'strong'

低于 20%(值 > 第 80 个百分位)然后 'weak'

否则平均

下面是我的数据框

  df=pd.DataFrame({'month':['1','1','1','1','1','2','2','2','2','2','2','2'],'X1': 
  [30,42,25,32,12,10,4,6,5,10,24,21],'X2':[10,76,100,23,65,94,67,24,67,54,87,81],'X3': 
  [23,78,95,52,60,76,68,92,34,76,34,12]})
  df

低于我的尝试

  df['X1_percentile'] = df.X1.rank(pct = True)
  df['X1_segment'] = np.where(df['X1_percentile']>0.8, 'Strong',np.where(df['X1_percentile'] 
  <0.20,'Weak', 'Average')) 

但我想为每个月和每一列都这样做。如果可能的话,这可以通过一个函数自动处理任何列号,并为每一列键入 colname+"_per" 和 colname+"_segment" 吗? 谢谢

我们可以使用 groupby + rank 和可选参数 pct=True 来计算以百分位数表示的排名,然后使用 np.select bin/categorize 百分位数将值转化为离散标签。

p = df.groupby('month').rank(pct=True)

df[p.columns + '_per'] = p
df[p.columns + '_seg'] = np.select([p.gt(.8), p.lt(.2)], ['strong', 'weak'], 'average')

   month  X1   X2  X3    X1_per    X2_per    X3_per   X1_seg   X2_seg   X3_seg
0      1  30   10  23  0.600000  0.200000  0.200000  average  average  average
1      1  42   76  78  1.000000  0.800000  0.800000   strong  average  average
2      1  25  100  95  0.400000  1.000000  1.000000  average   strong   strong
3      1  32   23  52  0.800000  0.400000  0.400000  average  average  average
4      1  12   65  60  0.200000  0.600000  0.600000  average  average  average
5      2  10   94  76  0.642857  1.000000  0.785714  average   strong  average
6      2   4   67  68  0.142857  0.500000  0.571429     weak  average  average
7      2   6   24  92  0.428571  0.142857  1.000000  average     weak   strong
8      2   5   67  34  0.285714  0.500000  0.357143  average  average  average
9      2  10   54  76  0.642857  0.285714  0.785714  average  average  average
10     2  24   87  34  1.000000  0.857143  0.357143   strong   strong  average
11     2  21   81  12  0.857143  0.714286  0.142857   strong  average     weak