将小计添加到 pandas pivot-table 的多层

adding subtotals to multiple layers of pandas pivot-table

假设我有一个非常基本的数据集:

name   food      city   rating
paul   cream     LA     2
daniel chocolate NY     3
paul   chocolate LA     4
john   cream     NY     5
daniel jam       LA     1
daniel butter    NY     3
john   jam       NY     9

我想计算每个人的食物偏好的描述性统计数据,这很简单:

df1 = pd.pivot_table(df, values='rating', index=['city', 'name', 'food'], aggfunc=['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'], margins=True, margins_name="Total")

但我想为每个名称和城市添加小计。

我可以在单独的对象中获取名称和城市的小计:

df2 = df.groupby('name').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df2.index = pd.MultiIndex.from_arrays([df2.index + '_total', len(df2.index) * ['']])
df3 = df.groupby('city').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df3.index = pd.MultiIndex.from_arrays([df3.index + '_total', len(df3.index) * ['']])

但是很难将三个表组合起来。 df1 的输出在每一行 'city' 'name' 和 'food' 的列

city   name   food   count  nunique...
LA     daniel jam    1      1
       paul   choc   1      1
              cream  1      1
NY     daniel butter 1      1

但 df2 和 df3 的输出只有 'name' *df2) 或 'city' (df3)

name          count nunique
daniel_total  3     1
john_total    2     1

我想合并这些文件,因此名称总计放在 'name' 列中,城市总计放在 'city' 中,如下所示:

city  name         food   count
LA    daniel       jam    1
      paul         choc   1
                   cream  1
LA_total                  3
NY    daniel       butter 1
NY_total                  2
      daniel_total        3
      john_total          2
      paul_total          2

我试过使用 pandas concat,但它会将描述性列组合在一起

pd.concat([df1, df2, df3].sort_index()

我想我需要告诉 python 将 df2 和 df3 数据集加入到哪一列,但不确定如何

让我们试试这个:

df2 = df.groupby(['city','name']).agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df2 = df2.rename(index=lambda x: x+'_total', level=1)
df2 = df2.swaplevel(0, 1, axis=1)
df2 = df2.assign(food='').set_index('food', append=True)

df3 = df.groupby('city').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df3.index = pd.MultiIndex.from_arrays([df3.index + '_total', len(df3.index) * ['']])
df3 = df3.assign(name='', food='').set_index(['name','food'], append=True)
df3 = df3.swaplevel(0,1, axis=1)

df_out = pd.concat([df1,df2,df3]).sort_index()
df_out

输出:

                                 count nunique    sum    min    max      mean       std       sem median       mad       var      skew
                                rating  rating rating rating rating    rating    rating    rating rating    rating    rating    rating
city     name         food                                                                                                            
LA       daniel       jam            1       1      1      1      1  1.000000       NaN       NaN      1  0.000000       NaN       NaN
         daniel_total                1       1      1      1      1  1.000000       NaN       NaN      1  0.000000       NaN       NaN
         paul         chocolate      1       1      4      4      4  4.000000       NaN       NaN      4  0.000000       NaN       NaN
                      cream          1       1      2      2      2  2.000000       NaN       NaN      2  0.000000       NaN       NaN
         paul_total                  2       2      6      2      4  3.000000  1.414214  1.000000      3  1.000000  2.000000       NaN
LA_total                             3       3      7      1      4  2.333333  1.527525  0.881917      2  1.111111  2.333333  0.935220
NY       daniel       butter         1       1      3      3      3  3.000000       NaN       NaN      3  0.000000       NaN       NaN
                      chocolate      1       1      3      3      3  3.000000       NaN       NaN      3  0.000000       NaN       NaN
         daniel_total                2       1      6      3      3  3.000000  0.000000  0.000000      3  0.000000  0.000000       NaN
         john         cream          1       1      5      5      5  5.000000       NaN       NaN      5  0.000000       NaN       NaN
                      jam            1       1      9      9      9  9.000000       NaN       NaN      9  0.000000       NaN       NaN
         john_total                  2       2     14      5      9  7.000000  2.828427  2.000000      7  2.000000  8.000000       NaN
NY_total                             4       3     20      3      9  5.000000  2.828427  1.414214      4  2.000000  8.000000  1.414214
Total                                7       6     27      1      9  3.857143  2.609506  0.986301      3  1.836735  6.809524  1.398866