如何将 json 数据更改为数据框?

How to change json data into dataframe?

我需要一个帮助来将 json 数据转换为数据帧。你能帮我看看怎么做吗?

示例:

JSON数据

{
    "user_id": "vmani4",
    "password": "*****",
    "api_name": "KOL",
    "body": {
      "api_name": "KOL",
      "columns": [
        "kol_id",
        "jnj_id",
        "kol_full_nm",
        "thrc_cd"
      ],
      "filter": {
        "kol_id": "101152",
        "jnj_id": "7124166",
        "thrc_nm": "VIR"
        
      }
    }
}

理想的输出:

user_id     password       api_name     columns       filter     filter_value
vmani        ******         KOL          kol_id       kol_id       101152
                                         jnj_id       jnj_id       7124166
                                         kol_full_nm  thrc_nm      VIR
                                         thrc_cd
  • data 将是 JSON.
  • 使用 pandas.json_normalize 将 JSON 加载到 DataFrame 中,并删除不需要的列。
  • 使用 pandas.DataFrame.explode,将 'body.columns' 列表扩展为单独的行。
  • data['body']['filter']
  • 创建一个单独的 DataFrame
  • 使用pandas.DataFrame.join将两者结合起来DataFrames
  • 无法将所有 'filter' 映射到所有 'body.columns'
    • 'thrc_nm' 没有映射到 'body.columns'.
    • 中的任何内容
    • 'filter''filter_value' 作为单独的列添加,按它们在 JSON 中的顺序排序,并且与 'body.columns'.
    • 无关
import pandas as pd

# load the json data
df = pd.json_normalize(data).drop(columns=['body.filter.kol_id', 'body.filter.jnj_id', 'body.filter.thrc_nm'])

# explode the column
df = df.explode('body.columns').reset_index(drop=True)

# load and clean data[body][filter]
df_filter = pd.DataFrame.from_dict(data['body']['filter'], orient='index').reset_index().rename(columns={'index': 'filter', 0: 'filter_value'})

# join the dataframes
dfj = df.join(df_filter)

# display(dfj)
  user_id password api_name body.api_name body.columns   filter filter_value
0  vmani4    *****      KOL           KOL       kol_id   kol_id       101152
1  vmani4    *****      KOL           KOL       jnj_id   jnj_id      7124166
2  vmani4    *****      KOL           KOL  kol_full_nm  thrc_nm          VIR
3  vmani4    *****      KOL           KOL      thrc_cd      NaN          NaN

选项

  • 我认为将每个过滤器作为一列更容易,值在它下面
# load data into a dataframe
df = pd.json_normalize(data)

# explode the column
df = df.explode('body.columns').reset_index(drop=True)

# display(df)
  user_id password api_name body.api_name body.columns body.filter.kol_id body.filter.jnj_id body.filter.thrc_nm
0  vmani4    *****      KOL           KOL       kol_id             101152            7124166                 VIR
1  vmani4    *****      KOL           KOL       jnj_id             101152            7124166                 VIR
2  vmani4    *****      KOL           KOL  kol_full_nm             101152            7124166                 VIR
3  vmani4    *****      KOL           KOL      thrc_cd             101152            7124166                 VIR

我对 DataFrame 不熟悉,但我尽力以正确的方式提出您想要的输出解决方案。

代码

import pandas as pd
import json
import numpy as np

json_data = """ {
    "user_id": "vmani4",
    "password": "*****",
    "api_name": "KOL",
    "body": {
      "api_name": "KOL",
      "columns": [
        "kol_id",
        "jnj_id",
        "kol_full_nm",
        "thrc_cd"
      ],
      "filter": {
        "kol_id": "101152",
        "jnj_id": "7124166",
        "thrc_nm": "VIR"
        
      }
    }
}"""

python_data = json.loads(json_data)

filter = {}
list_for_filter = []
filter_value = {}
list_for_filter_value = []
first_level = {}
for_colums = {}

for x, y in python_data.items():
    if type(y) is dict:
        for j, k in y.items():
            if j == 'columns':
                for_colums[j] = k
            if type(k) is dict:
                for m, n in k.items():
                    list_for_filter.append(m)
                    list_for_filter_value.append(n)
        break
    first_level[x] = [y]

filter['filter'] = list_for_filter
filter_value['filter_value'] = list_for_filter_value

res = {**first_level, **for_colums, **filter, **filter_value}

df = pd.concat([pd.Series(v, name=k) for k, v in res.items()], axis=1)
print(df)

输出

  user_id password api_name      columns   filter filter_value
0  vmani4    *****      KOL       kol_id   kol_id       101152
1     NaN      NaN      NaN       jnj_id   jnj_id      7124166
2     NaN      NaN      NaN  kol_full_nm  thrc_nm          VIR
3     NaN      NaN      NaN      thrc_cd      NaN          NaN

让我简要介绍一下我的代码首先创建了很多 listsdicts 我这样做的原因是我在您想要的输出中看到了一些不是的列实际上在你的代码中 filter_value.

我还循环遍历 dict 项目,以便制作另一个满足所需输出的 ​​dict。

毕竟因为 DataFrame 中列表的长度不相等,所以我使用了 concatseries