Python 列的百分位排名，由多个其他列分组

Question

我想按多个字段（'date' 和 'category'）对 pandas 数据帧进行分组，并且对于每个组，对另一个字段的值进行排序（'value' ) 按百分位数，同时保留原始 ('value') 字段。

我尝试过：

df2 = df.groupby(['date', 'category'])['value'].rank(pct=True)

但是这个 returns 只是 'value' 字段的百分位数。

Answer 1

我认为您需要将 Series 分配给新列：

df = pd.DataFrame({
         'value':[1,3,5,7,1,0],
         'category':[5] * 6,
         'date':list('aaabbb')
})


df['new'] = df.groupby(['date', 'category'])['value'].rank(pct=True)
print (df)
   value  category date       new
0      1         5    a  0.333333
1      3         5    a  0.666667
2      5         5    a  1.000000
3      7         5    b  1.000000
4      1         5    b  0.666667
5      0         5    b  0.333333

替代 DataFrame.assign：

df = df.assign(new= df.groupby(['date', 'category'])['value'].rank(pct=True))

Python 列的百分位排名，由多个其他列分组

Python percentile rank of a column, grouped by multiple other columns

python

percentile

pandas