如何创建具有多级列的 DataFrame?
How do I create a DataFrame with multi-level columns?
现有问题 处理非常 "regular" DataFrame
,其中所有列和行都是产品并且所有数据都存在。
唉,我的情况不同。我有这样的数据:
[{"street": "Euclid", "house":42, "area":123, (1,"bedrooms"):1, (1,"bathrooms"):4},
{"street": "Euclid", "house":19, "area":234, (2,"bedrooms"):3, (2,"bathrooms"):3},
{"street": "Riemann", "house":42, "area":345, (1,"bedrooms"):5,
(1,"bathrooms"):2, (2,"bedrooms"):12, (2, "bathrooms"):17},
{"street": "Riemann", "house":19, "area":456, (1,"bedrooms"):7, (1,"bathrooms"):1}]
我想要这种DataFrame
行和列具有多级索引:
area 1 2
street house bedrooms bathrooms bedrooms bathrooms
Euclid 42 123 1 4
Euclid 19 234 3 3
Riemann 42 345 5 2 12 17
Riemann 19 456 7 1
所以,行索引应该是
MultiIndex([("Euclid",42),("Euclid",19),("Riemann",42),("Riemann",19)],
names=["street","house"])
列索引应该是
MultiIndex([("area",None),(1,"bedrooms"),(1,"bathrooms"),(2,"bedrooms"),(2,"bathrooms")],
names=["floor","entity"])
而且我看不出有什么办法可以从我拥有的词典列表中生成这些索引。
我觉得应该有比这更好的东西;希望有人在 SO 上做得更好:
创建一个函数来处理字典中的每个条目:
def process(entry):
#read in data and get the keys to be the column names
m = pd.DataFrame.from_dict(entry,orient='index').T
#set index
m = m.set_index(['street','house'])
#create multi-index columns
col1 = [ent[0] if isinstance(ent,tuple) else ent for ent in m.columns ]
col2 = [ent[-1] if isinstance(ent,tuple) else None for ent in m.columns ]
#assign multi-index column to m
m.columns=[col1,col2]
return m
将上面的函数应用于数据(我将字典包装到 data 变量中):
res = [process(entry) for entry in data]
连接以获得最终输出
pd.concat(res)
area 1 2
NaN bedrooms bathrooms bedrooms bathrooms
street house
Euclid 42 123 1 4 NaN NaN
19 234 NaN NaN 3 3
Riemann 42 345 5 2 12 17
19 456 7 1 NaN NaN
现有问题 DataFrame
,其中所有列和行都是产品并且所有数据都存在。
唉,我的情况不同。我有这样的数据:
[{"street": "Euclid", "house":42, "area":123, (1,"bedrooms"):1, (1,"bathrooms"):4},
{"street": "Euclid", "house":19, "area":234, (2,"bedrooms"):3, (2,"bathrooms"):3},
{"street": "Riemann", "house":42, "area":345, (1,"bedrooms"):5,
(1,"bathrooms"):2, (2,"bedrooms"):12, (2, "bathrooms"):17},
{"street": "Riemann", "house":19, "area":456, (1,"bedrooms"):7, (1,"bathrooms"):1}]
我想要这种DataFrame
行和列具有多级索引:
area 1 2
street house bedrooms bathrooms bedrooms bathrooms
Euclid 42 123 1 4
Euclid 19 234 3 3
Riemann 42 345 5 2 12 17
Riemann 19 456 7 1
所以,行索引应该是
MultiIndex([("Euclid",42),("Euclid",19),("Riemann",42),("Riemann",19)],
names=["street","house"])
列索引应该是
MultiIndex([("area",None),(1,"bedrooms"),(1,"bathrooms"),(2,"bedrooms"),(2,"bathrooms")],
names=["floor","entity"])
而且我看不出有什么办法可以从我拥有的词典列表中生成这些索引。
我觉得应该有比这更好的东西;希望有人在 SO 上做得更好:
创建一个函数来处理字典中的每个条目:
def process(entry):
#read in data and get the keys to be the column names
m = pd.DataFrame.from_dict(entry,orient='index').T
#set index
m = m.set_index(['street','house'])
#create multi-index columns
col1 = [ent[0] if isinstance(ent,tuple) else ent for ent in m.columns ]
col2 = [ent[-1] if isinstance(ent,tuple) else None for ent in m.columns ]
#assign multi-index column to m
m.columns=[col1,col2]
return m
将上面的函数应用于数据(我将字典包装到 data 变量中):
res = [process(entry) for entry in data]
连接以获得最终输出
pd.concat(res)
area 1 2
NaN bedrooms bathrooms bedrooms bathrooms
street house
Euclid 42 123 1 4 NaN NaN
19 234 NaN NaN 3 3
Riemann 42 345 5 2 12 17
19 456 7 1 NaN NaN