在 Python 中使用列表理解解析嵌套 JSON

Question

我的数据如下（这只是摘录，但对象更多，有些没有additionalData）

{
   "referenceSetCount":1,
   "totalRowCount":4,
   "referenceSets":[
      {
         "name":"table",
         "rowCount":4,
         "_links":{
            "self":{
               "href":"link"
            }
         },
         "referenceDataItems":[
            {
               "col1":"5524",
               "col2":"yyy",
               "col3":1,
               "additionalData":[
                  {
                     "col1":111,
                     "col2":"xxxx",
                     "col3":1,
                     "col4":"18"
                  },
                  {
                     "col1":222,
                     "col2":"2222",
                     "col3":1,
                     "col4":"1"
                  }
               ]
            },
            {
               "col1":"26434",
               "col2":"dfdshere",
               "col3":2,
               "additionalData":[
                  {
                     "col1":34522,
                     "col2":"fsfs",
                     "col3":2,
                     "col4":"18"
                  },
                  {
                     "col1":5444,
                     "col2":"gregrege",
                     "col3":2,
                     "col4":"2"
                  }
               ]
            }
         ]
      }
   ]
}

我正在尝试使用列表理解进行迭代，以获取 referenceDataItems 的数据框以及该键中的所有内容，如果出现 additionalData 。

import os
import sys
import pandas as pd
import urllib.request, json

api_url = urllib.request.urlopen("link_to_my_data")

api_data = json.loads(api_url.read())

#nest loop to get referenceSets + nested additionalData
data_alt = [v for k, v in api_data.items() if k == 'referenceSets']

预期结果：

col1    col2        col3    col1    col2    col3    col4    col1    col2        col3    col4
5524    yyy         1       111     xxxx    1       18      222     2222        1       1
26434   dfdshere    2       34522   fsfs    2       18      5444    gregrege    2       2

Answer 1

我做了一些研究，这几乎得到了我想要的数据，在 COLUMNS_TO_DROP

中几乎不需要修改

COLUMNS_TO_DROP = ["additionalData"]

def expand_additional_data(items):
    for item in items:
        for av in item.get("additionalData", []):
            item[av["col2a"]] = av["col4a"]
        yield item


for ref_set in data["referenceSets"]:
    table_name = ref_set["name"]
    expanded = expand_additional_data(ref_set["referenceDataItems"])
    df = pd.DataFrame(expanded)
    df = df.drop(COLUMNS_TO_DROP, axis=1, errors="ignore")
    print(df)

在 Python 中使用列表理解解析嵌套 JSON

Parsing nested JSON with list comprehension in Python

python

list-comprehension

dataframe

pandas