如何在 pandas 中将 JSON 展平为宽格式

Question

我有一个 JSON 文件

response ={
  "classifier_id": "xxxxx-xx-1",
  "url": "/testers/xxxxx-xx-1",
  "collection": [
    {
      "text": "How hot will it be today?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 0.993
        },
        {
          "class_name": "conditions",
          "confidence": 0.006
        }
      ]
    },
    {
      "text": "Is it hot outside?",
      "top_class": "temperature",
      "classes": [
        {
          "class_name": "temperature",
          "confidence": 1.0
        },
        {
          "class_name": "conditions",
          "confidence": 0.0
        }
      ]
    }
  ]
}

当前输出

代码和不需要的输出

我尝试了 json_normalize，但是，它给出了重复项。

如何将此 Jason 文件转换为 Pandas DataFrame？

每个集合的记录应该扩展得更宽，而不是太长。

Answer 1

如果 json_normalize() 不能正确地用于您的 json 结构，您可以使用您的自定义逻辑对其进行解析。这是一个例子：

# define dictionary with desired structure
d = {
     'text': [],
     'top_class': [],
     'temperature': [],
     'confidence': [] 
}

# load json
data = json.loads(response)

# iterate over collection and extract elements needed
for el in data['collection']:
    d['text'].append(el['text'])
    d['top_class'].append(el['top_class'])
    d['temperature'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'temperature'][0])
    d['confidence'].append([e['confidence'] for e in el['classes'] if e['class_name'] == 'conditions'][0])
    
df = pd.DataFrame(d)

df.head()

输出：

Answer 2

使用 SO: How to flatten a nested JSON recursively, with flatten_json? 中的 flatten_json，'collection' 记录可以扩展为宽格式数据框。
列 headers 可以根据需要重命名为 pandas.DataFrame.rename。

df = pd.DataFrame([flatten_json(x) for x in response['collection']])

# display(df)
                        text    top_class classes_0_class_name  classes_0_confidence classes_1_class_name  classes_1_confidence
0  How hot will it be today?  temperature          temperature                 0.993           conditions                 0.006
1         Is it hot outside?  temperature          temperature                 1.000           conditions                 0.000

如何在 pandas 中将 JSON 展平为宽格式

How to flatten a JSON to a wide format, in pandas

python

json

normalize

dataframe

pandas

当前输出

代码和不需要的输出