计算百分比 Pandas groupby

Question

我有一个包含 4 列的数据框：'ID'（客户）、'item'、'tier'（high/low）、'units'（数字）。现在，对于每个项目和每个层级，我想找到总单位数以及有多少客户为每个层级至少购买了一个项目。我这样做

df.groupby(['item','tier']).agg(
    ID_amount=('ID', 'size'),
    total_units=('units', 'sum'))


item        tier    ID_amount      total_units
100010001   high    83             178,871.00
            low     153            1,450,986.00
100010002   high    722            10,452,778.00
            low     911            5,505,136.00
100020001   high    400              876,490.00
            low     402              962,983.00
100020002   high    4933          61,300,403.00
            low     13759        1,330,932,723.00
100020003   high    15063          176,846,161.00
            low     24905          288,232,057.00

我想要的是另一列代表 'total_units' 列的百分比。当我尝试

df.groupby(['item','tier']).agg(
        ID_amount=('ID', 'size'),
        total_units=('units', 'sum'),
        percen_units=('units', lambda x: 100*x/x.sum())

它给出了错误必须生成聚合值。我怎样才能修改我的代码来给我这些百分比？谢谢

Answer 1

我想你想要这个：

dfs = df.groupby(['item','tier']).agg(
        ID_amount=('ID', 'size'),
        total_units=('units', 'sum'))

dfs['percent_units'] = dfs.groupby('item')['total_units']\
                          .transform(lambda x: x/x.sum()*100)

dfs

计算百分比 Pandas groupby

Calculate percentage Pandas groupby

aggregate

percentage

pandas

pandas-groupby