如何使用 pandas 从 api 压平 json

How to flatten a json from an api using pandas

我有一个 json 从我附加到列表的 API 返回。完成该调用后,我需要使用 pandas 展平该数据。我不知道该怎么做。

代码:

api_results = []

response = requests.post(target_url, data=doc, headers=login_details)
       response_data = json.loads(response.text)
       if type(response_data)==dict and 'error' in response_data.keys():
           error_results.append(response_data)
       else:
           api_results.append(response_data)

当我调用 api_results 时,我的数据如下所示:

[{"requesturl":"http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4","clientid":"123456789","adjustedsummaryguidelines":{"midrangeallabsence":46,"midrangeclaims":36,"atriskallabsence":374,"atriskclaims":98},"riskassessment":{"score":87.95,"status":"Red (Extreme)","magnitude":"86.65","volatility":"89.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"alert":"yellow","days":185},"cp100":{"alert":"yellow","days":365}},"icdcodes":[{"code":"719.41","name":"Pain in joint, shoulder region","meandurationdays":{"bp":18,"cp95":72,"cp100":93}},{"code":"840.9","name":"Sprains and strains of unspecified site of shoulder and upper arm","meandurationdays":{"bp":10,"cp95":27,"cp100":35}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}
,{"clientid":"987654321","adjustedsummaryguidelines":{"midrangeallabsence":25,"midrangeclaims":42,"atriskallabsence":0,"atriskclaims":194},"riskassessment":{"score":76.85,"status":"Orange (High)","magnitude":"74.44","volatility":"79.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"days":95},"cp100":{"alert":"yellow","days":193}},"icdcodes":[{"code":"724.2","name":"Lumbago","meandurationdays":{"bp":10,"cp95":38,"cp100":50}},{"code":"847.2","name":"Sprain of lumbar","meandurationdays":{"bp":10,"cp95":22,"cp100":29}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}]

我一直在使用 json_normalize,但我知道我没有正确使用这个库。

如何展平这些数据?

我需要的是这个:

+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
| clientid|days| alert|days| alert|days|atriskallabsence|atriskclaims|midrangeallabsence|midrangeclaims|           alertdesc|alert|applied|magnitude|score|       status|volatility|  code| bp|cp100|cp95|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
|123456789|   2|yellow| 365|yellow| 185|             374|          98|                46|            36|[Recommend early ...|  red|      1|    86.65|87.95|Red (Extreme)|     89.25|719.41| 18|   93|  72|
|123456789|   2|yellow| 365|yellow| 185|             374|          98|                46|            36|[Recommend early ...|  red|      1|    86.65|87.95|Red (Extreme)|     89.25| 840.9| 10|   35|  27|
|987654321|   2|yellow| 193|  null|  95|               0|         194|                25|            42|[Recommend early ...|  red|      1|    74.44|76.85|Orange (High)|     79.25| 724.2| 10|   50|  38|
|987654321|   2|yellow| 193|  null|  95|               0|         194|                25|            42|[Recommend early ...|  red|      1|    74.44|76.85|Orange (High)|     79.25| 847.2| 10|   29|  22|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
  • 因为想要的结果是 'icdcodes' key 中的每个 dict 的数据有一个单独的行,所以最好的选择是使用 pandas.json_normalize
  • 首先创建主数据框并使用 pandas.DataFrame.explode('icdcodes'),这将根据 dicts in [=] 中的数字扩展数据框,使每个 'clientid' 具有适当的行数12=].
  • 'icdcodes' 列上使用 .json_normalize(),它是 dictslist,其中一些 values 也可能是 dicts.
  • .join 两个数据框并删除 'icdcodes'
  • 使用 pandas.DataFrame.rename() 重命名列,并根据需要使用 pandas.DataFrame.drop() 删除不需要的列。
  • 另请参阅此 answer from SO: Splitting dictionary/list inside a Pandas Column into Separate Columns
import pandas as pd

# create the initial dataframe from api_results
df = pd.json_normalize(api_results).explode('icdcodes').reset_index(drop=True)

# create a dataframe for only icdcodes, which will expand all the lists of dicts
icdcodes = pd.json_normalize(df.icdcodes)

# join df to icdcodes and drop the icdcodes column
df = df.join(icdcodes).drop(['icdcodes'], axis=1)

# display(df)
                                                                                             requesturl   clientid  adjustedsummaryguidelines.midrangeallabsence  adjustedsummaryguidelines.midrangeclaims  adjustedsummaryguidelines.atriskallabsence  adjustedsummaryguidelines.atriskclaims  riskassessment.score riskassessment.status riskassessment.magnitude riskassessment.volatility  adjustedduration.bp.days adjustedduration.cp95.alert  adjustedduration.cp95.days adjustedduration.cp100.alert  adjustedduration.cp100.days cfactors.legalrep.applied cfactors.legalrep.alert                                                       alertdesc.red                                                   alertdesc.yellow    code                                                               name  meandurationdays.bp  meandurationdays.cp95  meandurationdays.cp100
0  http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4  123456789                                            46                                        36                                         374                                      98                 87.95         Red (Extreme)                    86.65                     89.25                         2                      yellow                         185                       yellow                          365                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.  719.41                                     Pain in joint, shoulder region                   18                     72                      93
1  http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4  123456789                                            46                                        36                                         374                                      98                 87.95         Red (Extreme)                    86.65                     89.25                         2                      yellow                         185                       yellow                          365                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   840.9  Sprains and strains of unspecified site of shoulder and upper arm                   10                     27                      35
2                                                                                                   NaN  987654321                                            25                                        42                                           0                                     194                 76.85         Orange (High)                    74.44                     79.25                         2                         NaN                          95                       yellow                          193                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   724.2                                                            Lumbago                   10                     38                      50
3                                                                                                   NaN  987654321                                            25                                        42                                           0                                     194                 76.85         Orange (High)                    74.44                     79.25                         2                         NaN                          95                       yellow                          193                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   847.2                                                   Sprain of lumbar                   10                     22                      29