Python pandas groupby 在多列上聚合，然后进行透视

Question

在 Python 中，我有一个 pandas DataFrame 类似于以下内容：

Item | shop1 | shop2 | shop3 | Category
------------------------------------
Shoes| 45    | 50    | 53    | Clothes
TV   | 200   | 300   | 250   | Technology
Book | 20    | 17    | 21    | Books
phone| 300   | 350   | 400   | Technology

其中 shop1、shop2 和 shop3 是不同商店中每件商品的成本。现在，我需要 return 一个 DataFrame，经过一些数据清理后，就像这样：

Category (index)| size| sum| mean | std
----------------------------------------

其中 size 是每个类别中的商品数量，sum、mean 和 std 与应用于 3 家商店的相同函数相关。如何使用拆分-应用-组合模式（groupby、聚合、应用...）执行这些操作？

有人可以帮帮我吗？我要为这个疯狂...谢谢！

Answer 1

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})

或者，如果您想在所有商店中使用它，则：

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'})
df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})

Answer 2

如果我没理解错的话，您想计算所有商店的综合指标，而不是每个单独的。为此，您可以先 stack 您的数据框，然后按 Category:

分组

stacked = df.set_index(['Item', 'Category']).stack().reset_index()
stacked.columns = ['Item', 'Category', 'Shop', 'Price']
stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})

这导致

           Price                             
           count   sum        mean        std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

Answer 3

针对 Pandas 0.22+ 进行了编辑，考虑到不赞成通过聚合在组中使用字典。

我们设置了一个非常相似的字典，我们使用字典的键来指定我们的函数，并使用字典本身来重命名列。

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')
df.set_index(['Category', 'Item']).stack().groupby('Category') \
  .agg(rnm_cols.keys()).rename(columns=rnm_cols)

            Size   Sum        Mean        Std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

选项 1
使用 agg ← link 到文档

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')
df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)

                  Std   Sum        Mean  Size
Category                                     
Books        2.081666    58   19.333333     3
Clothes      4.041452   148   49.333333     3
Technology  70.710678  1800  300.000000     6

选项 2
事半功倍
使用 describe ← link 到文档

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()

            count        mean        std    min    25%    50%    75%    max
Category                                                                   
Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0
Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0
Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0

Python pandas groupby 在多列上聚合，然后进行透视

Python pandas groupby aggregate on multiple columns, then pivot

python

pivot

dataframe

pandas

data-cleaning