如何使用 pandas 从 api 压平 json
How to flatten a json from an api using pandas
我有一个 json
从我附加到列表的 API 返回。完成该调用后,我需要使用 pandas 展平该数据。我不知道该怎么做。
代码:
api_results = []
response = requests.post(target_url, data=doc, headers=login_details)
response_data = json.loads(response.text)
if type(response_data)==dict and 'error' in response_data.keys():
error_results.append(response_data)
else:
api_results.append(response_data)
当我调用 api_results
时,我的数据如下所示:
[{"requesturl":"http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4","clientid":"123456789","adjustedsummaryguidelines":{"midrangeallabsence":46,"midrangeclaims":36,"atriskallabsence":374,"atriskclaims":98},"riskassessment":{"score":87.95,"status":"Red (Extreme)","magnitude":"86.65","volatility":"89.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"alert":"yellow","days":185},"cp100":{"alert":"yellow","days":365}},"icdcodes":[{"code":"719.41","name":"Pain in joint, shoulder region","meandurationdays":{"bp":18,"cp95":72,"cp100":93}},{"code":"840.9","name":"Sprains and strains of unspecified site of shoulder and upper arm","meandurationdays":{"bp":10,"cp95":27,"cp100":35}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}
,{"clientid":"987654321","adjustedsummaryguidelines":{"midrangeallabsence":25,"midrangeclaims":42,"atriskallabsence":0,"atriskclaims":194},"riskassessment":{"score":76.85,"status":"Orange (High)","magnitude":"74.44","volatility":"79.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"days":95},"cp100":{"alert":"yellow","days":193}},"icdcodes":[{"code":"724.2","name":"Lumbago","meandurationdays":{"bp":10,"cp95":38,"cp100":50}},{"code":"847.2","name":"Sprain of lumbar","meandurationdays":{"bp":10,"cp95":22,"cp100":29}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}]
我一直在使用 json_normalize
,但我知道我没有正确使用这个库。
如何展平这些数据?
我需要的是这个:
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
| clientid|days| alert|days| alert|days|atriskallabsence|atriskclaims|midrangeallabsence|midrangeclaims| alertdesc|alert|applied|magnitude|score| status|volatility| code| bp|cp100|cp95|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
|123456789| 2|yellow| 365|yellow| 185| 374| 98| 46| 36|[Recommend early ...| red| 1| 86.65|87.95|Red (Extreme)| 89.25|719.41| 18| 93| 72|
|123456789| 2|yellow| 365|yellow| 185| 374| 98| 46| 36|[Recommend early ...| red| 1| 86.65|87.95|Red (Extreme)| 89.25| 840.9| 10| 35| 27|
|987654321| 2|yellow| 193| null| 95| 0| 194| 25| 42|[Recommend early ...| red| 1| 74.44|76.85|Orange (High)| 79.25| 724.2| 10| 50| 38|
|987654321| 2|yellow| 193| null| 95| 0| 194| 25| 42|[Recommend early ...| red| 1| 74.44|76.85|Orange (High)| 79.25| 847.2| 10| 29| 22|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
- 因为想要的结果是
'icdcodes'
key
中的每个 dict
的数据有一个单独的行,所以最好的选择是使用 pandas.json_normalize
。
- 首先创建主数据框并使用
pandas.DataFrame.explode('icdcodes')
,这将根据 dicts
in [=] 中的数字扩展数据框,使每个 'clientid'
具有适当的行数12=].
- 在
'icdcodes'
列上使用 .json_normalize()
,它是 dicts
的 list
,其中一些 values
也可能是 dicts
.
.join
两个数据框并删除 'icdcodes'
列
- 使用
pandas.DataFrame.rename()
重命名列,并根据需要使用 pandas.DataFrame.drop()
删除不需要的列。
- 另请参阅此 answer from SO: Splitting dictionary/list inside a Pandas Column into Separate Columns
import pandas as pd
# create the initial dataframe from api_results
df = pd.json_normalize(api_results).explode('icdcodes').reset_index(drop=True)
# create a dataframe for only icdcodes, which will expand all the lists of dicts
icdcodes = pd.json_normalize(df.icdcodes)
# join df to icdcodes and drop the icdcodes column
df = df.join(icdcodes).drop(['icdcodes'], axis=1)
# display(df)
requesturl clientid adjustedsummaryguidelines.midrangeallabsence adjustedsummaryguidelines.midrangeclaims adjustedsummaryguidelines.atriskallabsence adjustedsummaryguidelines.atriskclaims riskassessment.score riskassessment.status riskassessment.magnitude riskassessment.volatility adjustedduration.bp.days adjustedduration.cp95.alert adjustedduration.cp95.days adjustedduration.cp100.alert adjustedduration.cp100.days cfactors.legalrep.applied cfactors.legalrep.alert alertdesc.red alertdesc.yellow code name meandurationdays.bp meandurationdays.cp95 meandurationdays.cp100
0 http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4 123456789 46 36 374 98 87.95 Red (Extreme) 86.65 89.25 2 yellow 185 yellow 365 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 719.41 Pain in joint, shoulder region 18 72 93
1 http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4 123456789 46 36 374 98 87.95 Red (Extreme) 86.65 89.25 2 yellow 185 yellow 365 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 840.9 Sprains and strains of unspecified site of shoulder and upper arm 10 27 35
2 NaN 987654321 25 42 0 194 76.85 Orange (High) 74.44 79.25 2 NaN 95 yellow 193 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 724.2 Lumbago 10 38 50
3 NaN 987654321 25 42 0 194 76.85 Orange (High) 74.44 79.25 2 NaN 95 yellow 193 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 847.2 Sprain of lumbar 10 22 29
我有一个 json
从我附加到列表的 API 返回。完成该调用后,我需要使用 pandas 展平该数据。我不知道该怎么做。
代码:
api_results = []
response = requests.post(target_url, data=doc, headers=login_details)
response_data = json.loads(response.text)
if type(response_data)==dict and 'error' in response_data.keys():
error_results.append(response_data)
else:
api_results.append(response_data)
当我调用 api_results
时,我的数据如下所示:
[{"requesturl":"http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4","clientid":"123456789","adjustedsummaryguidelines":{"midrangeallabsence":46,"midrangeclaims":36,"atriskallabsence":374,"atriskclaims":98},"riskassessment":{"score":87.95,"status":"Red (Extreme)","magnitude":"86.65","volatility":"89.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"alert":"yellow","days":185},"cp100":{"alert":"yellow","days":365}},"icdcodes":[{"code":"719.41","name":"Pain in joint, shoulder region","meandurationdays":{"bp":18,"cp95":72,"cp100":93}},{"code":"840.9","name":"Sprains and strains of unspecified site of shoulder and upper arm","meandurationdays":{"bp":10,"cp95":27,"cp100":35}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}
,{"clientid":"987654321","adjustedsummaryguidelines":{"midrangeallabsence":25,"midrangeclaims":42,"atriskallabsence":0,"atriskclaims":194},"riskassessment":{"score":76.85,"status":"Orange (High)","magnitude":"74.44","volatility":"79.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"days":95},"cp100":{"alert":"yellow","days":193}},"icdcodes":[{"code":"724.2","name":"Lumbago","meandurationdays":{"bp":10,"cp95":38,"cp100":50}},{"code":"847.2","name":"Sprain of lumbar","meandurationdays":{"bp":10,"cp95":22,"cp100":29}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}]
我一直在使用 json_normalize
,但我知道我没有正确使用这个库。
如何展平这些数据?
我需要的是这个:
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
| clientid|days| alert|days| alert|days|atriskallabsence|atriskclaims|midrangeallabsence|midrangeclaims| alertdesc|alert|applied|magnitude|score| status|volatility| code| bp|cp100|cp95|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
|123456789| 2|yellow| 365|yellow| 185| 374| 98| 46| 36|[Recommend early ...| red| 1| 86.65|87.95|Red (Extreme)| 89.25|719.41| 18| 93| 72|
|123456789| 2|yellow| 365|yellow| 185| 374| 98| 46| 36|[Recommend early ...| red| 1| 86.65|87.95|Red (Extreme)| 89.25| 840.9| 10| 35| 27|
|987654321| 2|yellow| 193| null| 95| 0| 194| 25| 42|[Recommend early ...| red| 1| 74.44|76.85|Orange (High)| 79.25| 724.2| 10| 50| 38|
|987654321| 2|yellow| 193| null| 95| 0| 194| 25| 42|[Recommend early ...| red| 1| 74.44|76.85|Orange (High)| 79.25| 847.2| 10| 29| 22|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
- 因为想要的结果是
'icdcodes'
key
中的每个dict
的数据有一个单独的行,所以最好的选择是使用pandas.json_normalize
。 - 首先创建主数据框并使用
pandas.DataFrame.explode('icdcodes')
,这将根据dicts
in [=] 中的数字扩展数据框,使每个'clientid'
具有适当的行数12=]. - 在
'icdcodes'
列上使用.json_normalize()
,它是dicts
的list
,其中一些values
也可能是dicts
. .join
两个数据框并删除'icdcodes'
列- 使用
pandas.DataFrame.rename()
重命名列,并根据需要使用pandas.DataFrame.drop()
删除不需要的列。 - 另请参阅此 answer from SO: Splitting dictionary/list inside a Pandas Column into Separate Columns
import pandas as pd
# create the initial dataframe from api_results
df = pd.json_normalize(api_results).explode('icdcodes').reset_index(drop=True)
# create a dataframe for only icdcodes, which will expand all the lists of dicts
icdcodes = pd.json_normalize(df.icdcodes)
# join df to icdcodes and drop the icdcodes column
df = df.join(icdcodes).drop(['icdcodes'], axis=1)
# display(df)
requesturl clientid adjustedsummaryguidelines.midrangeallabsence adjustedsummaryguidelines.midrangeclaims adjustedsummaryguidelines.atriskallabsence adjustedsummaryguidelines.atriskclaims riskassessment.score riskassessment.status riskassessment.magnitude riskassessment.volatility adjustedduration.bp.days adjustedduration.cp95.alert adjustedduration.cp95.days adjustedduration.cp100.alert adjustedduration.cp100.days cfactors.legalrep.applied cfactors.legalrep.alert alertdesc.red alertdesc.yellow code name meandurationdays.bp meandurationdays.cp95 meandurationdays.cp100
0 http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4 123456789 46 36 374 98 87.95 Red (Extreme) 86.65 89.25 2 yellow 185 yellow 365 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 719.41 Pain in joint, shoulder region 18 72 93
1 http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4 123456789 46 36 374 98 87.95 Red (Extreme) 86.65 89.25 2 yellow 185 yellow 365 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 840.9 Sprains and strains of unspecified site of shoulder and upper arm 10 27 35
2 NaN 987654321 25 42 0 194 76.85 Orange (High) 74.44 79.25 2 NaN 95 yellow 193 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 724.2 Lumbago 10 38 50
3 NaN 987654321 25 42 0 194 76.85 Orange (High) 74.44 79.25 2 NaN 95 yellow 193 1 red Recommend early intervention and priority medical case management. Consider early intervention and priority medical case management. 847.2 Sprain of lumbar 10 22 29