如何在 pandas 中将 JSON 展平为宽格式
How to flatten a JSON to a wide format, in pandas
我有一个 JSON 文件
response ={
"classifier_id": "xxxxx-xx-1",
"url": "/testers/xxxxx-xx-1",
"collection": [
{
"text": "How hot will it be today?",
"top_class": "temperature",
"classes": [
{
"class_name": "temperature",
"confidence": 0.993
},
{
"class_name": "conditions",
"confidence": 0.006
}
]
},
{
"text": "Is it hot outside?",
"top_class": "temperature",
"classes": [
{
"class_name": "temperature",
"confidence": 1.0
},
{
"class_name": "conditions",
"confidence": 0.0
}
]
}
]
}
当前输出
代码和不需要的输出
我尝试了 json_normalize
,但是,它给出了重复项。
如何将此 Jason 文件转换为 Pandas DataFrame?
每个集合的记录应该扩展得更宽,而不是太长。
如果 json_normalize() 不能正确地用于您的 json 结构,您可以使用您的自定义逻辑对其进行解析。这是一个例子:
# define dictionary with desired structure
d = {
'text': [],
'top_class': [],
'temperature': [],
'confidence': []
}
# load json
data = json.loads(response)
# iterate over collection and extract elements needed
for el in data['collection']:
d['text'].append(el['text'])
d['top_class'].append(el['top_class'])
d['temperature'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'temperature'][0])
d['confidence'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'conditions'][0])
df = pd.DataFrame(d)
df.head()
输出:
- 使用 SO: How to flatten a nested JSON recursively, with flatten_json? 中的
flatten_json
,'collection'
记录可以扩展为宽格式数据框。
- 列 headers 可以根据需要重命名为
pandas.DataFrame.rename
。
df = pd.DataFrame([flatten_json(x) for x in response['collection']])
# display(df)
text top_class classes_0_class_name classes_0_confidence classes_1_class_name classes_1_confidence
0 How hot will it be today? temperature temperature 0.993 conditions 0.006
1 Is it hot outside? temperature temperature 1.000 conditions 0.000
我有一个 JSON 文件
response ={
"classifier_id": "xxxxx-xx-1",
"url": "/testers/xxxxx-xx-1",
"collection": [
{
"text": "How hot will it be today?",
"top_class": "temperature",
"classes": [
{
"class_name": "temperature",
"confidence": 0.993
},
{
"class_name": "conditions",
"confidence": 0.006
}
]
},
{
"text": "Is it hot outside?",
"top_class": "temperature",
"classes": [
{
"class_name": "temperature",
"confidence": 1.0
},
{
"class_name": "conditions",
"confidence": 0.0
}
]
}
]
}
当前输出
代码和不需要的输出
我尝试了 json_normalize
,但是,它给出了重复项。
如何将此 Jason 文件转换为 Pandas DataFrame?
每个集合的记录应该扩展得更宽,而不是太长。
如果 json_normalize() 不能正确地用于您的 json 结构,您可以使用您的自定义逻辑对其进行解析。这是一个例子:
# define dictionary with desired structure
d = {
'text': [],
'top_class': [],
'temperature': [],
'confidence': []
}
# load json
data = json.loads(response)
# iterate over collection and extract elements needed
for el in data['collection']:
d['text'].append(el['text'])
d['top_class'].append(el['top_class'])
d['temperature'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'temperature'][0])
d['confidence'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'conditions'][0])
df = pd.DataFrame(d)
df.head()
输出:
- 使用 SO: How to flatten a nested JSON recursively, with flatten_json? 中的
flatten_json
,'collection'
记录可以扩展为宽格式数据框。 - 列 headers 可以根据需要重命名为
pandas.DataFrame.rename
。
df = pd.DataFrame([flatten_json(x) for x in response['collection']])
# display(df)
text top_class classes_0_class_name classes_0_confidence classes_1_class_name classes_1_confidence
0 How hot will it be today? temperature temperature 0.993 conditions 0.006
1 Is it hot outside? temperature temperature 1.000 conditions 0.000