如何创建具有多级列的 DataFrame？

Question

现有问题处理非常 "regular" DataFrame，其中所有列和行都是产品并且所有数据都存在。

唉，我的情况不同。我有这样的数据：

[{"street": "Euclid", "house":42, "area":123, (1,"bedrooms"):1, (1,"bathrooms"):4},
 {"street": "Euclid", "house":19, "area":234, (2,"bedrooms"):3, (2,"bathrooms"):3},
 {"street": "Riemann", "house":42, "area":345, (1,"bedrooms"):5,
  (1,"bathrooms"):2, (2,"bedrooms"):12, (2, "bathrooms"):17},
 {"street": "Riemann", "house":19, "area":456, (1,"bedrooms"):7, (1,"bathrooms"):1}]

我想要这种DataFrame行和列具有多级索引：

              area          1                  2
street house        bedrooms bathrooms bedrooms bathrooms
Euclid  42    123     1         4
Euclid  19    234                         3         3
Riemann 42    345     5         2        12        17
Riemann 19    456     7         1

所以，行索引应该是

MultiIndex([("Euclid",42),("Euclid",19),("Riemann",42),("Riemann",19)],
           names=["street","house"])

列索引应该是

MultiIndex([("area",None),(1,"bedrooms"),(1,"bathrooms"),(2,"bedrooms"),(2,"bathrooms")],
           names=["floor","entity"])

而且我看不出有什么办法可以从我拥有的词典列表中生成这些索引。

Answer 1

我觉得应该有比这更好的东西；希望有人在 SO 上做得更好：

创建一个函数来处理字典中的每个条目：

def process(entry):
    #read in data and get the keys to be the column names
    m = pd.DataFrame.from_dict(entry,orient='index').T
    #set index
    m = m.set_index(['street','house'])
    #create multi-index columns
    col1 = [ent[0] if isinstance(ent,tuple) else ent for ent in m.columns ]
    col2 = [ent[-1] if isinstance(ent,tuple) else None for ent in m.columns ]
    #assign multi-index column to m
    m.columns=[col1,col2]
    return m

将上面的函数应用于数据（我将字典包装到 data 变量中）：

res = [process(entry) for entry in data]

连接以获得最终输出

pd.concat(res)

                area               1                  2
                NaN    bedrooms bathrooms   bedrooms    bathrooms
street  house                   
Euclid    42    123     1        4           NaN         NaN
          19    234     NaN      NaN         3           3
Riemann   42    345     5        2           12          17
          19    456     7        1           NaN         NaN

如何创建具有多级列的 DataFrame？

How do I create a DataFrame with multi-level columns?

python

multi-level

pandas