将嵌套 JSON 转换为具有引用嵌套路径的列的 Dataframe
Convert nested JSON to Dataframe with columns referencing nested paths
我正在尝试将嵌套的 JSON 转换为包含三列的 CSV 文件:0 级键、分支和最低级叶。
例如,在下面的JSON中:
{
"protein": {
"meat": {
"chicken": {},
"beef": {},
"pork": {}
},
"powder": {
"^ISOPURE": {},
"substitute": {}
}
},
"carbs": {
"_vegetables": {
"veggies": {
"lettuce": {},
"carrots": {},
"corn": {}
}
},
"bread": {
"white": {},
"multigrain": {
"whole wheat": {}
},
"other": {}
}
},
"fat": {
"healthy": {
"avocado": {}
},
"unhealthy": {}
}
}
我想创建这样的输出(没有包括整个树示例只是为了说明问题):
level 0
branch
leaf
protein
protein.meat
chicken
protein
protein.meat
beef
我尝试使用 json 规范化,但实际文件没有可用于识别嵌套字段的路径,因为每个字典都是唯一的。
这是 returns 级别 0 字段,但我需要将它们作为行而不是列。非常感谢任何帮助。
我创建了一个函数,可以根据这样的键值取消嵌套 json:
import json
with open('path/to/json') as m:
my_json = json.load(m)
def unnest_json(data):
for key, value in data.items():
print(str(key)+'.'+str(value))
if isinstance(value, dict):
unnest_json(value)
elif isinstance(value, list):
for val in value:
if isinstance(val, str):
pass
elif isinstance(val, list):
pass
else:
unnest_json(val)
unnest_json(my_json)
可能不是最简洁的方法,但我认为您可以使用某种递归函数(下面代码中的 traverse
)将字典转换为列值列表,然后将它们转换为 pandas 数据框。
data = {
"protein": {
"meat": {
"chicken": {},
"beef": {},
"pork": {}
},
"powder": {
"^ISOPURE": {},
"substitute": {}
}
},
"carbs": {
"_vegetables": {
"veggies": {
"lettuce": {},
"carrots": {},
"corn": {}
}
},
"bread": {
"white": {},
"multigrain": {
"whole wheat": {}
},
"other": {}
}
},
"fat": {
"healthy": {
"avocado": {}
},
"unhealthy": {}
}
}
def traverse(col_values, dictionary, rows):
for key in dictionary:
new_col_values = list(col_values)
if dictionary[key]:
new_col_values[1] += '.' + key
traverse(new_col_values, dictionary[key], rows)
else:
new_col_values[2] = key
rows.append(new_col_values)
rows = []
for key in data:
traverse([key, str(key), None], data[key], rows)
import pandas as pd
df = pd.DataFrame(rows, columns=["level 0", "branch", "leaf"])
print(df)
我正在尝试将嵌套的 JSON 转换为包含三列的 CSV 文件:0 级键、分支和最低级叶。
例如,在下面的JSON中:
{
"protein": {
"meat": {
"chicken": {},
"beef": {},
"pork": {}
},
"powder": {
"^ISOPURE": {},
"substitute": {}
}
},
"carbs": {
"_vegetables": {
"veggies": {
"lettuce": {},
"carrots": {},
"corn": {}
}
},
"bread": {
"white": {},
"multigrain": {
"whole wheat": {}
},
"other": {}
}
},
"fat": {
"healthy": {
"avocado": {}
},
"unhealthy": {}
}
}
我想创建这样的输出(没有包括整个树示例只是为了说明问题):
level 0 | branch | leaf |
---|---|---|
protein | protein.meat | chicken |
protein | protein.meat | beef |
我尝试使用 json 规范化,但实际文件没有可用于识别嵌套字段的路径,因为每个字典都是唯一的。
这是 returns 级别 0 字段,但我需要将它们作为行而不是列。非常感谢任何帮助。
我创建了一个函数,可以根据这样的键值取消嵌套 json:
import json
with open('path/to/json') as m:
my_json = json.load(m)
def unnest_json(data):
for key, value in data.items():
print(str(key)+'.'+str(value))
if isinstance(value, dict):
unnest_json(value)
elif isinstance(value, list):
for val in value:
if isinstance(val, str):
pass
elif isinstance(val, list):
pass
else:
unnest_json(val)
unnest_json(my_json)
可能不是最简洁的方法,但我认为您可以使用某种递归函数(下面代码中的 traverse
)将字典转换为列值列表,然后将它们转换为 pandas 数据框。
data = {
"protein": {
"meat": {
"chicken": {},
"beef": {},
"pork": {}
},
"powder": {
"^ISOPURE": {},
"substitute": {}
}
},
"carbs": {
"_vegetables": {
"veggies": {
"lettuce": {},
"carrots": {},
"corn": {}
}
},
"bread": {
"white": {},
"multigrain": {
"whole wheat": {}
},
"other": {}
}
},
"fat": {
"healthy": {
"avocado": {}
},
"unhealthy": {}
}
}
def traverse(col_values, dictionary, rows):
for key in dictionary:
new_col_values = list(col_values)
if dictionary[key]:
new_col_values[1] += '.' + key
traverse(new_col_values, dictionary[key], rows)
else:
new_col_values[2] = key
rows.append(new_col_values)
rows = []
for key in data:
traverse([key, str(key), None], data[key], rows)
import pandas as pd
df = pd.DataFrame(rows, columns=["level 0", "branch", "leaf"])
print(df)