按行和列的多索引分组

Question

我在 here 的 material 之后使用 Pandas 创建了一个 table。

创建的 table 对列和行都使用了多索引。

我正在尝试计算每年和主题的描述性统计数据，意思是，显示例如 Bob 2013 年的平均值，Guido 2013 年的平均值，以及 Sue 2013 年所有主题的平均值，多年来。 Bob 的手段将考虑 HR 和 Temp 的手段。注意：类型相同纯属巧合，因为实现的 table 并非如此。未包含在屏幕截图中的其他主题具有不同的类型。

我最接近的解决方案是通过以下代码 df.groupby(level = 0, axis = 0).describe() 这按年份对数据进行了分组，但并未按主题分组。

Answer 1

也不鼓励向外部网站提供 links，因为它们可能 change/disappear 在没有 SO 控制的情况下随时

话虽如此，link 提供了回答问题所需的大部分工具。更具体地说，stack 和 mean 的组合应该会给出您具体询问的内容：

health_data.stack().mean(level = 'year')

生产


subject Bob     Guido   Sue
year            
2013    28.4    40.400  34.15
2014    43.2    38.025  41.10

或更普遍

health_data.stack().groupby('year').describe()

为每个主题生成一个长数据框，其中包含按年份分组的统计信息：

subject Bob                                     Guido                       Sue
count   mean    std min 25% 50% 75% max count   mean    ... 75% max count   mean    std min 25% 50% 75% max
year                                                                                    
2013    4.0 28.4    11.580443   13.0    22.75   31.3    36.95   38.0    4.0 40.400  ... 42.500  50.0    4.0 34.15   4.297674    30.0    30.75   33.95   37.35   38.7
2014    4.0 43.2    7.566593    36.4    37.15   42.2    48.25   52.0    4.0 38.025  ... 39.875  44.0    4.0 41.10   12.961996   28.0    35.65   38.70   44.15   59.0

按行和列的多索引分组

Grouping by Multi-Indices of both Row and Column

python

aggregate

pandas