将嵌套 JSON 转换为具有引用嵌套路径的列的 Dataframe

Question

我正在尝试将嵌套的 JSON 转换为包含三列的 CSV 文件：0 级键、分支和最低级叶。

例如，在下面的JSON中：

{
    "protein": {
        "meat": {
            "chicken": {},
            "beef": {},
            "pork": {}
        },
        "powder": {
            "^ISOPURE": {},
            "substitute": {}
        }
    },
    "carbs": {
        "_vegetables": {
            "veggies": {
                "lettuce": {},
                "carrots": {},
                "corn": {}
            }
        },
        "bread": {
            "white": {},
            "multigrain": {
                "whole wheat": {}
            },
            "other": {}
        }
    },
    "fat": {
        "healthy": {
            "avocado": {}
        },
        "unhealthy": {}
    }
}

我想创建这样的输出（没有包括整个树示例只是为了说明问题）：

level 0	branch	leaf
protein	protein.meat	chicken
protein	protein.meat	beef

我尝试使用 json 规范化，但实际文件没有可用于识别嵌套字段的路径，因为每个字典都是唯一的。

这是 returns 级别 0 字段，但我需要将它们作为行而不是列。非常感谢任何帮助。

我创建了一个函数，可以根据这样的键值取消嵌套 json：

import json

with open('path/to/json') as m:
    my_json = json.load(m)


def unnest_json(data):
    for key, value in data.items():
    print(str(key)+'.'+str(value))
    if isinstance(value, dict):
        unnest_json(value)
    elif isinstance(value, list):
        for val in value:
            if isinstance(val, str):
                pass
            elif isinstance(val, list):
                pass
            else:
                unnest_json(val)

unnest_json(my_json)

Answer 1

可能不是最简洁的方法，但我认为您可以使用某种递归函数（下面代码中的 traverse）将字典转换为列值列表，然后将它们转换为 pandas 数据框。

data = {
    "protein": {
        "meat": {
            "chicken": {},
            "beef": {},
            "pork": {}
        },
        "powder": {
            "^ISOPURE": {},
            "substitute": {}
        }
    },
    "carbs": {
        "_vegetables": {
            "veggies": {
                "lettuce": {},
                "carrots": {},
                "corn": {}
            }
        },
        "bread": {
            "white": {},
            "multigrain": {
                "whole wheat": {}
            },
            "other": {}
        }
    },
    "fat": {
        "healthy": {
            "avocado": {}
        },
        "unhealthy": {}
    }
}

def traverse(col_values, dictionary, rows):
    for key in dictionary:
        new_col_values = list(col_values)
        if dictionary[key]:
            new_col_values[1] += '.' + key
            traverse(new_col_values, dictionary[key], rows)
        else:
            new_col_values[2] = key
            rows.append(new_col_values)

rows = []
for key in data:
    traverse([key, str(key), None], data[key], rows)

import pandas as pd

df = pd.DataFrame(rows, columns=["level 0", "branch", "leaf"])
print(df)

将嵌套 JSON 转换为具有引用嵌套路径的列的 Dataframe

Convert nested JSON to Dataframe with columns referencing nested paths

python

json

key-value

dataframe

pandas