pandas：追加一个分位数值的列

Question

我有以下数据框

   item_id group  price
0        1     A     10
1        3     A     30
2        4     A     40
3        6     A     60
4        2     B     20
5        5     B     50

我希望根据每个组的价格添加一个分位数列，如下所示：

item_id       group        price    quantile
 01            A            10        0.25
 03            A            30        0.5
 04            A            40        0.75
 06            A            60        1.0
 02            B            20        0.5
 05            B            50        1.0

我可以遍历整个数据帧并为每个组执行计算。但是，我想知道是否有更优雅的方法来解决这个问题？谢谢！

Answer 1

你需要 df.rank() 和 pct=True:

pct : bool, default False Whether or not to display the returned rankings in percentile form.

df['quantile']=df.groupby('group')['price'].rank(pct=True)
print(df)

   item_id group  price  quantile
0        1     A     10      0.25
1        3     A     30      0.50
2        4     A     40      0.75
3        6     A     60      1.00
4        2     B     20      0.50
5        5     B     50      1.00

Answer 2

虽然上面的df.rank方法可能是解决这个问题的方法。这是另一个使用 pd.qcut 和 GroupBy 的解决方案：

df['quantile'] = (
    df.groupby('group')['price']
      .apply(lambda x: pd.qcut(x, q=len(x), labels=False)
             .add(1)
             .div(len(x))
            )
)

   item_id group  price  quantile
0        1     A     10      0.25
1        3     A     30      0.50
2        4     A     40      0.75
3        6     A     60      1.00
4        2     B     20      0.50
5        5     B     50      1.00

pandas：追加一个分位数值的列

pandas: append a column with quantile values

quantile

dataframe

python-3.x

pandas

pandas-groupby