如何找到 Python 中组内每一行的加权百分位数？

Question

假设我有以下数据框。最后一列是我需要的，类别中的其余列 I have.Percentile 计算为价格的加权百分位数，权重为类别中售出的商品数量

| Category   |    Price    |  Items Sold  |  Percentile within category|
|:-----------|------------:|:------------:|:--------------------------:|
|     A      |     560     |      5       |      92.56                 |
|     A      |     360     |      2       |      12.56                 |
|     B      |     510     |      3       |      42.56                 |
|     A      |     520     |      4       |      72.36                 |
|     B      |     960     |      6       |      91.56                 |
|     C      |     130     |      2       |      100.00                |

我需要使用的函数是stats.percentileofscore。但是不知道怎么用。

编辑：插入数据框图像，因为不确定如何显示 table

Edit2：我没有准确计算所有行的输出值。对于 A-560，它应该是 81.81%

stats.percentileofscore([560,560,560,560,560,360,360,520,520,520,520], 560)

给出 81.81%

Answer 1

您可以使用简单的 groupby 并应用函数来获取加权值来做到这一点

data = {'Category' : ['A', 'A', 'B', 'A', 'B', 'C'],
    'Price' : [560, 360, 510, 520, 960, 130],
    'Items' : [5, 2, 3, 4, 6, 2]}

df = pd.DataFrame(data).sort_values('Category')
def fun(x):
    t = (x['Price'] * x['Items']).sum()
    return (x['Price'] * x['Items'])/t
df['weighted'] = df.groupby('Category').apply(fun).values

一定有更好的方法，因为这取决于排序顺序是否正确。也许有人会加入并提供更好的解决方案。

结果：

  Category  Price  Items  weighted
0        A    560      5  0.500000
1        A    360      2  0.128571
3        A    520      4  0.371429
2        B    510      3  0.209877
4        B    960      6  0.790123
5        C    130      2  1.000000

如何找到 Python 中组内每一行的加权百分位数？

How do I find weighted percentiles for each row within a group in Python?

python

group-by

percentile

pandas