json_normalize JSON 包含字典列表的文件(包括示例)
json_normalize JSON file with list containing dictionary (sample included)
这是一个示例 json 文件,我正在处理 2 条记录:
[{"Time":"2016-01-10",
"ID"
:13567,
"Content":{
"Event":"UPDATE",
"Id":{"EventID":"ABCDEFG"},
"Story":[{
"@ContentCat":"News",
"Body":"Related Meeting Memo: Engagement with target firm for potential M&A. Please be on call this weekend for news updates.",
"BodyTextType":"PLAIN_TEXT",
"DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]},
"DerivedTopics":{"Topics":[
{"Id":"Meeting","Score":70},
{"Id":"Performance","Score":70},
{"Id":"Engagement","Score":100},
{"Id":"Salary","Score":70},
{"Id":"Career","Score":100}]
},
"HotLevel":0,
"LanguageString":"ENGLISH",
"Metadata":{"ClassNum":50,
"Headline":"Attn: Weekend",
"WireId":2035,
"WireName":"IIS"},
"Version":"Original"}
]},
"yyyymmdd":"20160110",
"month":201601},
{"Time":"2016-01-12",
"ID":13568,
"Content":{
"Event":"DEAL",
"Id":{"EventID":"ABCDEFG2"},
"Story":[{
"@ContentCat":"Details",
"Body":"Test email contents",
"BodyTextType":"PLAIN_TEXT",
"DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]},
"DerivedTopics":{"Topics":[
{"Id":"Meeting","Score":70},
{"Id":"Engagement","Score":100},
{"Id":"Salary","Score":70},
{"Id":"Career","Score":100}]
},
"HotLevel":0,
"LanguageString":"ENGLISH",
"Metadata":{"ClassNum":70,
"Headline":"Attn: Weekend",
"WireId":2037,
"WireName":"IIS"},
"Version":"Original"}
]},
"yyyymmdd":"20160112",
"month":201602}]
我正在尝试获取实体 ID 级别的数据框(从记录 1 和 Bob
、Jon
中提取 Amy
和 Jon
, Jack
来自记录 2).
但是我很早就遇到了错误。到目前为止,这是我的代码,假设示例 json 保存为 sample.json
:
data = json.load(open('sample.json'))
test = json_normalize(data, record_path=['Content', 'Story'])
导致此错误:
TypeError: string indices must be integers
我怀疑这是因为 Content.Story 实际上是一个包含字典的列表,而不是字典本身。但我不清楚如何真正克服这个问题?
编辑:为了澄清,我最终试图达到实体 ID 的级别(内容 > 故事 > DerivedID > 实体 > Id)。显示 Content.Story 代码示例只是为了说明我现在在解决这个问题。
json_normalize(data, record_path=[['Content', 'Story']])
应该可以。
这是一个示例 json 文件,我正在处理 2 条记录:
[{"Time":"2016-01-10",
"ID"
:13567,
"Content":{
"Event":"UPDATE",
"Id":{"EventID":"ABCDEFG"},
"Story":[{
"@ContentCat":"News",
"Body":"Related Meeting Memo: Engagement with target firm for potential M&A. Please be on call this weekend for news updates.",
"BodyTextType":"PLAIN_TEXT",
"DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]},
"DerivedTopics":{"Topics":[
{"Id":"Meeting","Score":70},
{"Id":"Performance","Score":70},
{"Id":"Engagement","Score":100},
{"Id":"Salary","Score":70},
{"Id":"Career","Score":100}]
},
"HotLevel":0,
"LanguageString":"ENGLISH",
"Metadata":{"ClassNum":50,
"Headline":"Attn: Weekend",
"WireId":2035,
"WireName":"IIS"},
"Version":"Original"}
]},
"yyyymmdd":"20160110",
"month":201601},
{"Time":"2016-01-12",
"ID":13568,
"Content":{
"Event":"DEAL",
"Id":{"EventID":"ABCDEFG2"},
"Story":[{
"@ContentCat":"Details",
"Body":"Test email contents",
"BodyTextType":"PLAIN_TEXT",
"DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]},
"DerivedTopics":{"Topics":[
{"Id":"Meeting","Score":70},
{"Id":"Engagement","Score":100},
{"Id":"Salary","Score":70},
{"Id":"Career","Score":100}]
},
"HotLevel":0,
"LanguageString":"ENGLISH",
"Metadata":{"ClassNum":70,
"Headline":"Attn: Weekend",
"WireId":2037,
"WireName":"IIS"},
"Version":"Original"}
]},
"yyyymmdd":"20160112",
"month":201602}]
我正在尝试获取实体 ID 级别的数据框(从记录 1 和 Bob
、Jon
中提取 Amy
和 Jon
, Jack
来自记录 2).
但是我很早就遇到了错误。到目前为止,这是我的代码,假设示例 json 保存为 sample.json
:
data = json.load(open('sample.json'))
test = json_normalize(data, record_path=['Content', 'Story'])
导致此错误:
TypeError: string indices must be integers
我怀疑这是因为 Content.Story 实际上是一个包含字典的列表,而不是字典本身。但我不清楚如何真正克服这个问题?
编辑:为了澄清,我最终试图达到实体 ID 的级别(内容 > 故事 > DerivedID > 实体 > Id)。显示 Content.Story 代码示例只是为了说明我现在在解决这个问题。
json_normalize(data, record_path=[['Content', 'Story']])
应该可以。