如何将具有字典列的数据框转换为多级数据框
How to convert data frame with dictionary columns into multi level data frame
我有 DataFrame,它在列中包含字典。
可以创建如下
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
pd.DataFrame(lis)
author created id items
0 {'self': 'A', 'displayName': 'A'} 2018-12-18 1 {'field': 'status', 'fromString': 'Backlog'}
1 {'self': 'B', 'displayName': 'B'} 2018-12-18 2 {'field': 'status', 'fromString': 'Funnel'}
我想转换此信息多级 DataFrame。
我一直在尝试
pd.MultiIndex.from_product(lis)
pd.MultiIndex.from_frame(pd.DataFrame(lis))
但无法得到我想要的结果 for.Basically 我想要如下所示:
author created id items
self displayName field fromString
A A 2018-12-18 1 status Backlog
B B 2018-12-18 2 status Funnel
关于我如何实现这一点有什么建议吗?
谢谢
您可以使用 json.json_normalize
- 但列名会用 .
分隔符展平:
from pandas.io.json import json_normalize
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
df = json_normalize(lis)
print (df)
id created author.self author.displayName items.field items.fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
对于列和索引中的 MulitIndex
- 首先通过 DataFrame.set_index
and then use str.split
:
所有没有 .
的列创建 Mulitiindex
df = df.set_index(['id','created'])
df.columns = df.columns.str.split('.', expand=True)
print (df)
author items
self displayName field fromString
id created
1 2018-12-18 A A status Backlog
2 2018-12-18 B B status Funnel
如果列中需要 MulitIndex
- 这是可能的,但会在列名称中获取缺失值:
df.columns = df.columns.str.split('.', expand=True)
print (df)
id created author items
NaN NaN self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
缺失值应替换为空字符串:
df = df.rename(columns= lambda x: '' if x != x else x)
print (df)
id created author items
self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
试试下面的方法,希望这会有所帮助。
df = pd.io.json.json_normalize(lis)
print(sorted(df.columns))
tupleList = [tuple(values.split(".")) if "." in values else (values,None) for values in sorted(df.columns)]
df.columns=pd.MultiIndex.from_tuples(tuplelist)
print(df)
输出如下所示
author created id items
displayName self NaN NaN field fromString
A A 2018-12-18 1 status Backlog
B B 2018-12-18 2 status Funnel
我有 DataFrame,它在列中包含字典。
可以创建如下
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
pd.DataFrame(lis)
author created id items
0 {'self': 'A', 'displayName': 'A'} 2018-12-18 1 {'field': 'status', 'fromString': 'Backlog'}
1 {'self': 'B', 'displayName': 'B'} 2018-12-18 2 {'field': 'status', 'fromString': 'Funnel'}
我想转换此信息多级 DataFrame。
我一直在尝试
pd.MultiIndex.from_product(lis)
pd.MultiIndex.from_frame(pd.DataFrame(lis))
但无法得到我想要的结果 for.Basically 我想要如下所示:
author created id items
self displayName field fromString
A A 2018-12-18 1 status Backlog
B B 2018-12-18 2 status Funnel
关于我如何实现这一点有什么建议吗?
谢谢
您可以使用 json.json_normalize
- 但列名会用 .
分隔符展平:
from pandas.io.json import json_normalize
lis = [
{'id': '1',
'author': {'self': 'A',
'displayName': 'A'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Backlog'}},
{'id': '2',
'author': {'self': 'B',
'displayName': 'B'},
'created': '2018-12-18',
'items': {'field': 'status',
'fromString': 'Funnel'}}]
df = json_normalize(lis)
print (df)
id created author.self author.displayName items.field items.fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
对于列和索引中的 MulitIndex
- 首先通过 DataFrame.set_index
and then use str.split
:
.
的列创建 Mulitiindex
df = df.set_index(['id','created'])
df.columns = df.columns.str.split('.', expand=True)
print (df)
author items
self displayName field fromString
id created
1 2018-12-18 A A status Backlog
2 2018-12-18 B B status Funnel
如果列中需要 MulitIndex
- 这是可能的,但会在列名称中获取缺失值:
df.columns = df.columns.str.split('.', expand=True)
print (df)
id created author items
NaN NaN self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
缺失值应替换为空字符串:
df = df.rename(columns= lambda x: '' if x != x else x)
print (df)
id created author items
self displayName field fromString
0 1 2018-12-18 A A status Backlog
1 2 2018-12-18 B B status Funnel
试试下面的方法,希望这会有所帮助。
df = pd.io.json.json_normalize(lis)
print(sorted(df.columns))
tupleList = [tuple(values.split(".")) if "." in values else (values,None) for values in sorted(df.columns)]
df.columns=pd.MultiIndex.from_tuples(tuplelist)
print(df)
输出如下所示
author created id items
displayName self NaN NaN field fromString
A A 2018-12-18 1 status Backlog
B B 2018-12-18 2 status Funnel