在 Python 中使用列表理解解析嵌套 JSON
Parsing nested JSON with list comprehension in Python
我的数据如下(这只是摘录,但对象更多,有些没有additionalData
)
{
"referenceSetCount":1,
"totalRowCount":4,
"referenceSets":[
{
"name":"table",
"rowCount":4,
"_links":{
"self":{
"href":"link"
}
},
"referenceDataItems":[
{
"col1":"5524",
"col2":"yyy",
"col3":1,
"additionalData":[
{
"col1":111,
"col2":"xxxx",
"col3":1,
"col4":"18"
},
{
"col1":222,
"col2":"2222",
"col3":1,
"col4":"1"
}
]
},
{
"col1":"26434",
"col2":"dfdshere",
"col3":2,
"additionalData":[
{
"col1":34522,
"col2":"fsfs",
"col3":2,
"col4":"18"
},
{
"col1":5444,
"col2":"gregrege",
"col3":2,
"col4":"2"
}
]
}
]
}
]
}
我正在尝试使用列表理解进行迭代,以获取 referenceDataItems
的数据框以及该键中的所有内容,如果出现 additionalData
。
import os
import sys
import pandas as pd
import urllib.request, json
api_url = urllib.request.urlopen("link_to_my_data")
api_data = json.loads(api_url.read())
#nest loop to get referenceSets + nested additionalData
data_alt = [v for k, v in api_data.items() if k == 'referenceSets']
预期结果:
col1 col2 col3 col1 col2 col3 col4 col1 col2 col3 col4
5524 yyy 1 111 xxxx 1 18 222 2222 1 1
26434 dfdshere 2 34522 fsfs 2 18 5444 gregrege 2 2
我做了一些研究,这几乎得到了我想要的数据,在 COLUMNS_TO_DROP
中几乎不需要修改
COLUMNS_TO_DROP = ["additionalData"]
def expand_additional_data(items):
for item in items:
for av in item.get("additionalData", []):
item[av["col2a"]] = av["col4a"]
yield item
for ref_set in data["referenceSets"]:
table_name = ref_set["name"]
expanded = expand_additional_data(ref_set["referenceDataItems"])
df = pd.DataFrame(expanded)
df = df.drop(COLUMNS_TO_DROP, axis=1, errors="ignore")
print(df)
我的数据如下(这只是摘录,但对象更多,有些没有additionalData
)
{
"referenceSetCount":1,
"totalRowCount":4,
"referenceSets":[
{
"name":"table",
"rowCount":4,
"_links":{
"self":{
"href":"link"
}
},
"referenceDataItems":[
{
"col1":"5524",
"col2":"yyy",
"col3":1,
"additionalData":[
{
"col1":111,
"col2":"xxxx",
"col3":1,
"col4":"18"
},
{
"col1":222,
"col2":"2222",
"col3":1,
"col4":"1"
}
]
},
{
"col1":"26434",
"col2":"dfdshere",
"col3":2,
"additionalData":[
{
"col1":34522,
"col2":"fsfs",
"col3":2,
"col4":"18"
},
{
"col1":5444,
"col2":"gregrege",
"col3":2,
"col4":"2"
}
]
}
]
}
]
}
我正在尝试使用列表理解进行迭代,以获取 referenceDataItems
的数据框以及该键中的所有内容,如果出现 additionalData
。
import os
import sys
import pandas as pd
import urllib.request, json
api_url = urllib.request.urlopen("link_to_my_data")
api_data = json.loads(api_url.read())
#nest loop to get referenceSets + nested additionalData
data_alt = [v for k, v in api_data.items() if k == 'referenceSets']
预期结果:
col1 col2 col3 col1 col2 col3 col4 col1 col2 col3 col4
5524 yyy 1 111 xxxx 1 18 222 2222 1 1
26434 dfdshere 2 34522 fsfs 2 18 5444 gregrege 2 2
我做了一些研究,这几乎得到了我想要的数据,在 COLUMNS_TO_DROP
COLUMNS_TO_DROP = ["additionalData"]
def expand_additional_data(items):
for item in items:
for av in item.get("additionalData", []):
item[av["col2a"]] = av["col4a"]
yield item
for ref_set in data["referenceSets"]:
table_name = ref_set["name"]
expanded = expand_additional_data(ref_set["referenceDataItems"])
df = pd.DataFrame(expanded)
df = df.drop(COLUMNS_TO_DROP, axis=1, errors="ignore")
print(df)