将小计添加到 pandas pivot-table 的多层
adding subtotals to multiple layers of pandas pivot-table
假设我有一个非常基本的数据集:
name food city rating
paul cream LA 2
daniel chocolate NY 3
paul chocolate LA 4
john cream NY 5
daniel jam LA 1
daniel butter NY 3
john jam NY 9
我想计算每个人的食物偏好的描述性统计数据,这很简单:
df1 = pd.pivot_table(df, values='rating', index=['city', 'name', 'food'], aggfunc=['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'], margins=True, margins_name="Total")
但我想为每个名称和城市添加小计。
我可以在单独的对象中获取名称和城市的小计:
df2 = df.groupby('name').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df2.index = pd.MultiIndex.from_arrays([df2.index + '_total', len(df2.index) * ['']])
df3 = df.groupby('city').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df3.index = pd.MultiIndex.from_arrays([df3.index + '_total', len(df3.index) * ['']])
但是很难将三个表组合起来。
df1 的输出在每一行 'city' 'name' 和 'food' 的列
city name food count nunique...
LA daniel jam 1 1
paul choc 1 1
cream 1 1
NY daniel butter 1 1
但 df2 和 df3 的输出只有 'name' *df2) 或 'city' (df3)
name count nunique
daniel_total 3 1
john_total 2 1
我想合并这些文件,因此名称总计放在 'name' 列中,城市总计放在 'city' 中,如下所示:
city name food count
LA daniel jam 1
paul choc 1
cream 1
LA_total 3
NY daniel butter 1
NY_total 2
daniel_total 3
john_total 2
paul_total 2
我试过使用 pandas concat,但它会将描述性列组合在一起
pd.concat([df1, df2, df3].sort_index()
我想我需要告诉 python 将 df2 和 df3 数据集加入到哪一列,但不确定如何
让我们试试这个:
df2 = df.groupby(['city','name']).agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df2 = df2.rename(index=lambda x: x+'_total', level=1)
df2 = df2.swaplevel(0, 1, axis=1)
df2 = df2.assign(food='').set_index('food', append=True)
df3 = df.groupby('city').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df3.index = pd.MultiIndex.from_arrays([df3.index + '_total', len(df3.index) * ['']])
df3 = df3.assign(name='', food='').set_index(['name','food'], append=True)
df3 = df3.swaplevel(0,1, axis=1)
df_out = pd.concat([df1,df2,df3]).sort_index()
df_out
输出:
count nunique sum min max mean std sem median mad var skew
rating rating rating rating rating rating rating rating rating rating rating rating
city name food
LA daniel jam 1 1 1 1 1 1.000000 NaN NaN 1 0.000000 NaN NaN
daniel_total 1 1 1 1 1 1.000000 NaN NaN 1 0.000000 NaN NaN
paul chocolate 1 1 4 4 4 4.000000 NaN NaN 4 0.000000 NaN NaN
cream 1 1 2 2 2 2.000000 NaN NaN 2 0.000000 NaN NaN
paul_total 2 2 6 2 4 3.000000 1.414214 1.000000 3 1.000000 2.000000 NaN
LA_total 3 3 7 1 4 2.333333 1.527525 0.881917 2 1.111111 2.333333 0.935220
NY daniel butter 1 1 3 3 3 3.000000 NaN NaN 3 0.000000 NaN NaN
chocolate 1 1 3 3 3 3.000000 NaN NaN 3 0.000000 NaN NaN
daniel_total 2 1 6 3 3 3.000000 0.000000 0.000000 3 0.000000 0.000000 NaN
john cream 1 1 5 5 5 5.000000 NaN NaN 5 0.000000 NaN NaN
jam 1 1 9 9 9 9.000000 NaN NaN 9 0.000000 NaN NaN
john_total 2 2 14 5 9 7.000000 2.828427 2.000000 7 2.000000 8.000000 NaN
NY_total 4 3 20 3 9 5.000000 2.828427 1.414214 4 2.000000 8.000000 1.414214
Total 7 6 27 1 9 3.857143 2.609506 0.986301 3 1.836735 6.809524 1.398866
假设我有一个非常基本的数据集:
name food city rating
paul cream LA 2
daniel chocolate NY 3
paul chocolate LA 4
john cream NY 5
daniel jam LA 1
daniel butter NY 3
john jam NY 9
我想计算每个人的食物偏好的描述性统计数据,这很简单:
df1 = pd.pivot_table(df, values='rating', index=['city', 'name', 'food'], aggfunc=['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'], margins=True, margins_name="Total")
但我想为每个名称和城市添加小计。
我可以在单独的对象中获取名称和城市的小计:
df2 = df.groupby('name').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df2.index = pd.MultiIndex.from_arrays([df2.index + '_total', len(df2.index) * ['']])
df3 = df.groupby('city').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df3.index = pd.MultiIndex.from_arrays([df3.index + '_total', len(df3.index) * ['']])
但是很难将三个表组合起来。 df1 的输出在每一行 'city' 'name' 和 'food' 的列
city name food count nunique...
LA daniel jam 1 1
paul choc 1 1
cream 1 1
NY daniel butter 1 1
但 df2 和 df3 的输出只有 'name' *df2) 或 'city' (df3)
name count nunique
daniel_total 3 1
john_total 2 1
我想合并这些文件,因此名称总计放在 'name' 列中,城市总计放在 'city' 中,如下所示:
city name food count
LA daniel jam 1
paul choc 1
cream 1
LA_total 3
NY daniel butter 1
NY_total 2
daniel_total 3
john_total 2
paul_total 2
我试过使用 pandas concat,但它会将描述性列组合在一起
pd.concat([df1, df2, df3].sort_index()
我想我需要告诉 python 将 df2 和 df3 数据集加入到哪一列,但不确定如何
让我们试试这个:
df2 = df.groupby(['city','name']).agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df2 = df2.rename(index=lambda x: x+'_total', level=1)
df2 = df2.swaplevel(0, 1, axis=1)
df2 = df2.assign(food='').set_index('food', append=True)
df3 = df.groupby('city').agg(['count', 'nunique', 'sum', 'min', 'max', 'mean', 'std', 'sem', 'median', 'mad', 'var', 'skew'])
df3.index = pd.MultiIndex.from_arrays([df3.index + '_total', len(df3.index) * ['']])
df3 = df3.assign(name='', food='').set_index(['name','food'], append=True)
df3 = df3.swaplevel(0,1, axis=1)
df_out = pd.concat([df1,df2,df3]).sort_index()
df_out
输出:
count nunique sum min max mean std sem median mad var skew
rating rating rating rating rating rating rating rating rating rating rating rating
city name food
LA daniel jam 1 1 1 1 1 1.000000 NaN NaN 1 0.000000 NaN NaN
daniel_total 1 1 1 1 1 1.000000 NaN NaN 1 0.000000 NaN NaN
paul chocolate 1 1 4 4 4 4.000000 NaN NaN 4 0.000000 NaN NaN
cream 1 1 2 2 2 2.000000 NaN NaN 2 0.000000 NaN NaN
paul_total 2 2 6 2 4 3.000000 1.414214 1.000000 3 1.000000 2.000000 NaN
LA_total 3 3 7 1 4 2.333333 1.527525 0.881917 2 1.111111 2.333333 0.935220
NY daniel butter 1 1 3 3 3 3.000000 NaN NaN 3 0.000000 NaN NaN
chocolate 1 1 3 3 3 3.000000 NaN NaN 3 0.000000 NaN NaN
daniel_total 2 1 6 3 3 3.000000 0.000000 0.000000 3 0.000000 0.000000 NaN
john cream 1 1 5 5 5 5.000000 NaN NaN 5 0.000000 NaN NaN
jam 1 1 9 9 9 9.000000 NaN NaN 9 0.000000 NaN NaN
john_total 2 2 14 5 9 7.000000 2.828427 2.000000 7 2.000000 8.000000 NaN
NY_total 4 3 20 3 9 5.000000 2.828427 1.414214 4 2.000000 8.000000 1.414214
Total 7 6 27 1 9 3.857143 2.609506 0.986301 3 1.836735 6.809524 1.398866