如何将嵌套的 JSON 展平为 pandas 数据框

Question

我有点棘手JSON我想放入数据框。

{'A': {'name': 'A',
  'left_foot': [{'toes': '5'}],
  'right_foot': [{'toes': '4'}]},
 'B': {'name': 'B',
  'left_foot': [{'toes': '3'}],
  'right_foot': [{'toes': '5'}]},
...
}

我不需要带有 A 和 B 的第一层，因为它是名称的一部分。永远只有一个left_foot和一个right_foot.

我要的数据如下：

     name  left_foot.toes right_foot.toes
0       A           5           4
1       B           3           5

使用可以得到脚和脚趾，但前提是你说数据 ["A"]。有没有更简单的方法？

编辑我有这样的东西，但我需要在第一行指定"A"。

df = pd.json_normalize(tickers["A"]).pipe(
    lambda x: x.drop('left_foot', 1).join(
        x.left_foot.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns={"toes": "left_foot.toes"}).pipe(
    lambda x: x.drop('right_foot', 1).join(
        x.right_foot.apply(lambda y: pd.Series(merge(y)))
    )).rename(columns={"toes": "right_foot.toes"})

Answer 1

根据您的数据，每个顶级 key（例如 'A' 和 'B'）在 'name' 中重复为 value，因此它将更容易在 dict.

values

pandas.json_normalize

需要展开 'left_foot' 和 'right_foot' 列以从 list

dict

最后一步将 dicts 的列转换为数据框并将其连接回 df
不一定是更少的代码，但这应该比当前代码中使用的 multiple apply 要快得多。
- 查看此 timing analysis 比较 apply pandas.Series 与仅使用 pandas.DataFrame 转换列。
如果由于您的数据框在要展开并转换为数据框的列中有 NaN（例如缺少 dicts 或 lists）而出现问题，请参阅

import pandas as pd

# test data
data = {'A': {'name': 'A', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'B': {'name': 'B', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}, 'C': {'name': 'C', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'D': {'name': 'D', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}}

# normalize data.values and explode the dicts out of the lists
df = pd.json_normalize(data.values()).apply(pd.Series.explode).reset_index(drop=True)

# display(df)
  name      left_foot     right_foot
0    A  {'toes': '5'}  {'toes': '4'}
1    B  {'toes': '3'}  {'toes': '5'}
2    C  {'toes': '5'}  {'toes': '4'}
3    D  {'toes': '3'}  {'toes': '5'}

# extract the values from the dicts and create toe columns
df = df.join(pd.DataFrame(df.pop('left_foot').values.tolist())).rename(columns={'toes': 'lf_toes'})
df = df.join(pd.DataFrame(df.pop('right_foot').values.tolist())).rename(columns={'toes': 'rf_toes'})

# display(df)
  name lf_toes rf_toes
0    A       5       4
1    B       3       5
2    C       5       4
3    D       3       5

如何将嵌套的 JSON 展平为 pandas 数据框

How to flatten a nested JSON into a pandas dataframe

python

pandas

dataframe

json-normalize