将嵌套 JSON 转换为具有引用嵌套路径的列的 Dataframe

Convert nested JSON to Dataframe with columns referencing nested paths

我正在尝试将嵌套的 JSON 转换为包含三列的 CSV 文件:0 级键、分支和最低级叶。

例如,在下面的JSON中:

{
    "protein": {
        "meat": {
            "chicken": {},
            "beef": {},
            "pork": {}
        },
        "powder": {
            "^ISOPURE": {},
            "substitute": {}
        }
    },
    "carbs": {
        "_vegetables": {
            "veggies": {
                "lettuce": {},
                "carrots": {},
                "corn": {}
            }
        },
        "bread": {
            "white": {},
            "multigrain": {
                "whole wheat": {}
            },
            "other": {}
        }
    },
    "fat": {
        "healthy": {
            "avocado": {}
        },
        "unhealthy": {}
    }
}

我想创建这样的输出(没有包括整个树示例只是为了说明问题):

level 0 branch leaf
protein protein.meat chicken
protein protein.meat beef

我尝试使用 json 规范化,但实际文件没有可用于识别嵌套字段的路径,因为每个字典都是唯一的。

这是 returns 级别 0 字段,但我需要将它们作为行而不是列。非常感谢任何帮助。

我创建了一个函数,可以根据这样的键值取消嵌套 json:

import json

with open('path/to/json') as m:
    my_json = json.load(m)


def unnest_json(data):
    for key, value in data.items():
    print(str(key)+'.'+str(value))
    if isinstance(value, dict):
        unnest_json(value)
    elif isinstance(value, list):
        for val in value:
            if isinstance(val, str):
                pass
            elif isinstance(val, list):
                pass
            else:
                unnest_json(val)

unnest_json(my_json)

可能不是最简洁的方法,但我认为您可以使用某种递归函数(下面代码中的 traverse)将字典转换为列值列表,然后将它们转换为 pandas 数据框。

data = {
    "protein": {
        "meat": {
            "chicken": {},
            "beef": {},
            "pork": {}
        },
        "powder": {
            "^ISOPURE": {},
            "substitute": {}
        }
    },
    "carbs": {
        "_vegetables": {
            "veggies": {
                "lettuce": {},
                "carrots": {},
                "corn": {}
            }
        },
        "bread": {
            "white": {},
            "multigrain": {
                "whole wheat": {}
            },
            "other": {}
        }
    },
    "fat": {
        "healthy": {
            "avocado": {}
        },
        "unhealthy": {}
    }
}

def traverse(col_values, dictionary, rows):
    for key in dictionary:
        new_col_values = list(col_values)
        if dictionary[key]:
            new_col_values[1] += '.' + key
            traverse(new_col_values, dictionary[key], rows)
        else:
            new_col_values[2] = key
            rows.append(new_col_values)

rows = []
for key in data:
    traverse([key, str(key), None], data[key], rows)

import pandas as pd

df = pd.DataFrame(rows, columns=["level 0", "branch", "leaf"])
print(df)