在 Python 中展开嵌套的 JSON 层次结构
Flatten a nested JSON hierarchy in Python
首先,我无论如何都不是开发人员,但我被抛出了这个任务,我只是迷路了。这是我第一次使用 python,也是 7 年多来第一次编码,但进展并不顺利。
我的 JSON 是一个组织树,其中每个级别都可能在其下面有子项。
我需要在 Jupyter Notebook 中的 Python 中编写一个脚本,以将其展平成这种格式,或者类似的格式,其中每个新子项都是一个新行。
level1 | level2 | level3
org1
org1 org2
org1 org2 org3
这里是 JSON:
[{
"Id": "f035de7f",
"Name": "Org1",
"ParentId": null,
"Children": [{
"Id": "8c18a70d",
"Name": "Org2",
"ParentId": "f035de7f",
"Children": []
}, {
"Id": "b4514099",
"Name": "Org3",
"ParentId": "f035de7f",
"Children": [{
"Id": "8abe58d1",
"Name": "Org4",
"Children": []
}]
}, {
"Id": "8e35bdc3",
"Name": "Org5",
"ParentId": "f035de7f",
"Children": [{
"Id": "331fffbf",
"Name": "Org6",
"ParentId": "8e35bdc3",
"Children": [{
"Id": "3bc3e085",
"Name": "Org7",
"ParentId": "331fffbf",
"Children": []
}]
}]
}]
}]
我已经尝试了各种 for 循环并在互联网上搜索了好几天,但我认为我缺少一些非常基本的知识来完成这项工作。我非常感谢有人能提供的任何帮助。
这是我的开场白:
for item in orgs_json:
orgs_json_children = item["Children"]
orgs_list.append(orgs_json_children)
或
wanted = ['Children', 'Name']
for item in orgs_json[0]:
details = [X["Name"] for X in orgs_json]
for key in wanted:
print(key, ':', json.dumps(details[key], indent=4))
# Put a blank line at the end of the details for each item
print()
可以用栈来处理嵌套结构:
- 从最外面的列表开始,反转,作为堆栈,每个都有一个空元组,以跟踪组织路径 .
- 在
while stack:
循环中,取出栈顶元素。对那个组织做你需要做的,比如记录名字。从组织路径中生成一行并添加当前组织名称。
- 将
Children
键中的所有元素连同父组织的组织路径添加到堆栈。
- 循环直到堆栈完成。
需要反转,因为从堆栈中取出元素会使它们以相反的顺序排列。您仍然希望为此作业使用堆栈(而不是队列),因为我们希望输出信息深度优先。
这看起来像这样:
def flatten_orgs(orgs):
stack = [(o, ()) for o in reversed(orgs)] # organisation plus path
while stack:
org, path = stack.pop() # top element
path += (org['Name'],) # update path, adding the current name
yield path # give this path to the caller
# add all children to the stack, with the current path
stack += ((o, path) for o in reversed(org['Children']))
然后您可以循环上述函数来获取所有路径:
>>> for path in flatten_orgs(orgs_json):
... print(*path, sep='\t')
...
Org1
Org1 Org2
Org1 Org3
Org1 Org3 Org4
Org1 Org5
Org1 Org5 Org6
Org1 Org5 Org6 Org7
您可以递归地迭代您的数据。 Prefix代表目前看到的名字列表,data代表你还需要学习的词典列表。
data = [{
"Id": "f035de7f",
"Name": "Org1",
"ParentId": None,
"Children": [{
"Id": "8c18a70d",
"Name": "Org2",
"ParentId": "f035de7f",
"Children": []
}, {
"Id": "b4514099",
"Name": "Org3",
"ParentId": "f035de7f",
"Children": [{
"Id": "8abe58d1",
"Name": "Org4",
"Children": []
}],
}, {
"Id": "8e35bdc3",
"Name": "Org5",
"ParentId": "f035de7f",
"Children": [{
"Id": "331fffbf",
"Name": "Org6",
"ParentId": "8e35bdc3",
"Children": [{
"Id": "3bc3e085",
"Name": "Org7",
"ParentId": "331fffbf",
"Children": []
}],
}],
}],
}]
def flatten(data, prefix):
if not data:
return [prefix]
result = []
for org in data:
name = org["Name"]
result.extend(flatten(org["Children"], prefix + [name]))
return result
print(flatten(data, []))
# [['Org1', 'Org2'], ['Org1', 'Org3', 'Org4'], ['Org1', 'Org5', 'Org6', 'Org7']]
同理,使用yield:
def flatten(data, prefix):
if not data:
yield prefix
for org in data:
name = org["Name"]
yield from flatten(org["Children"], prefix + [name])
print(list(flatten(data, [])))
如果您需要所有部分列表,解决方案更短:
def flatten(data, prefix):
yield prefix
for org in data:
name = org["Name"]
yield from flatten(org["Children"], prefix + [name])
print(list(flatten(data, [])))
# [[], ['Org1'], ['Org1', 'Org2'], ['Org1', 'Org3'], ['Org1', 'Org3', 'Org4'], ['Org1', 'Org5'], ['Org1', 'Org5', 'Org6'], ['Org1', 'Org5', 'Org6', 'Org7']]
一棵json递归树可以有多个根,叶子不必强制指定无效的孩子。例如,这是一棵树,有两个根 'a' 和 'b',节点只有 'level' 数据,即节点深度('children' 是可选的):
json_struct = [
{
'level': 'a0',
'children': [{'level': 'a0.1', 'children':
[{'level': 'a0.1.1', 'children': []}]},
{'level': 'a0.2', 'children': [
{'level': 'a0.2.1', 'children': [
{'level': 'a0.2.1.1'},
{'level': 'a0.2.1.2'},
{'level': 'a0.2.1.3'},
{'level': 'a0.2.1.4', 'children': [{'level': 'a0.2.1.4.1'}, {'level': 'a0.2.1.4.2'}]}
]
}
]
},
{'level': 'a0.3', 'children': []},
{'level': 'a0.4', 'children': [{'level': 'a0.4.1'}, {'level': 'a0.4.2', 'children': []}]}
]
},
{
'level': 'b0',
'children': [{'level': 'b0.1', 'children': [{'level': 'b0.1.1'}]},
{'level': 'b0.2', 'children': [{'level': 'b0.2.1', 'children': [
{'level': 'b0.2.1.1'},
{'level': 'b0.2.1.2'},
{'level': 'b0.2.1.3', 'children': [{'level': 'b0.2.1.3.1'}, {'level': 'b0.2.1.3.2'}]},
{'level': 'b0.2.1.4'}
]
}]},
{'level': 'b0.3'}
]
}
]
代码必须return叶子和完整的分支路径,直到每次离开:
def flatten_json_tree(nodes, lower_nodes_key='children', path=[]):
if not nodes: # void node
yield path # so it is a leaf
for node in nodes: # go on into each branch of the tree
level = node['level'] # get node datas
try:
lower_nodes = node[lower_nodes_key] # search for lower nodes
except KeyError:
lower_nodes = [] # no lower nodes
yield from flatten_json_tree(lower_nodes, lower_nodes_key, path + [level]) # continue to explore the branch until leaf
if __name__ == "__main__":
for path in list(flatten_json_tree(json_struct)):
leaf = path[-1:][0]
complete_path = ''
for node in path:
complete_path += node + (' -> ' if node is not leaf else '')
print("LEAF: {:20s} PATH: {}".format(leaf, complete_path))
它显示:
- 叶子:a0.1.1 路径:a0 -> a0.1 -> a0.1.1
- 叶子:a0.2.1.1 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.1
- 叶子:a0.2.1.2 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.2
- 叶子:a0.2.1.3 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.3
- 叶子:a0.2.1.4.1 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.4 -> a0.2.1.4.1
- 叶子:a0.2.1.4.2 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.4 -> a0.2.1.4.2
- 叶子:a0.3 路径:a0 -> a0.3
- 叶子:a0.4.1 路径:a0 -> a0.4 -> a0.4.1
- 叶子:a0.4.2 路径:a0 -> a0.4 -> a0.4.2
- 叶子:b0.1.1 路径:b0 -> b0.1 -> b0.1.1
- 叶子:b0.2.1.1 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.1
- 叶子:b0.2.1.2 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.2
- 叶子:b0.2.1.3.1 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.3 -> b0.2.1.3.1
- 叶子:b0.2.1.3.2 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.3 -> b0.2.1.3.2
- 叶子:b0.2.1.4 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.4
- 叶子:b0.3 路径:b0 -> b0.3
首先,我无论如何都不是开发人员,但我被抛出了这个任务,我只是迷路了。这是我第一次使用 python,也是 7 年多来第一次编码,但进展并不顺利。
我的 JSON 是一个组织树,其中每个级别都可能在其下面有子项。
我需要在 Jupyter Notebook 中的 Python 中编写一个脚本,以将其展平成这种格式,或者类似的格式,其中每个新子项都是一个新行。
level1 | level2 | level3
org1
org1 org2
org1 org2 org3
这里是 JSON:
[{
"Id": "f035de7f",
"Name": "Org1",
"ParentId": null,
"Children": [{
"Id": "8c18a70d",
"Name": "Org2",
"ParentId": "f035de7f",
"Children": []
}, {
"Id": "b4514099",
"Name": "Org3",
"ParentId": "f035de7f",
"Children": [{
"Id": "8abe58d1",
"Name": "Org4",
"Children": []
}]
}, {
"Id": "8e35bdc3",
"Name": "Org5",
"ParentId": "f035de7f",
"Children": [{
"Id": "331fffbf",
"Name": "Org6",
"ParentId": "8e35bdc3",
"Children": [{
"Id": "3bc3e085",
"Name": "Org7",
"ParentId": "331fffbf",
"Children": []
}]
}]
}]
}]
我已经尝试了各种 for 循环并在互联网上搜索了好几天,但我认为我缺少一些非常基本的知识来完成这项工作。我非常感谢有人能提供的任何帮助。
这是我的开场白:
for item in orgs_json:
orgs_json_children = item["Children"]
orgs_list.append(orgs_json_children)
或
wanted = ['Children', 'Name']
for item in orgs_json[0]:
details = [X["Name"] for X in orgs_json]
for key in wanted:
print(key, ':', json.dumps(details[key], indent=4))
# Put a blank line at the end of the details for each item
print()
可以用栈来处理嵌套结构:
- 从最外面的列表开始,反转,作为堆栈,每个都有一个空元组,以跟踪组织路径 .
- 在
while stack:
循环中,取出栈顶元素。对那个组织做你需要做的,比如记录名字。从组织路径中生成一行并添加当前组织名称。 - 将
Children
键中的所有元素连同父组织的组织路径添加到堆栈。 - 循环直到堆栈完成。
需要反转,因为从堆栈中取出元素会使它们以相反的顺序排列。您仍然希望为此作业使用堆栈(而不是队列),因为我们希望输出信息深度优先。
这看起来像这样:
def flatten_orgs(orgs):
stack = [(o, ()) for o in reversed(orgs)] # organisation plus path
while stack:
org, path = stack.pop() # top element
path += (org['Name'],) # update path, adding the current name
yield path # give this path to the caller
# add all children to the stack, with the current path
stack += ((o, path) for o in reversed(org['Children']))
然后您可以循环上述函数来获取所有路径:
>>> for path in flatten_orgs(orgs_json):
... print(*path, sep='\t')
...
Org1
Org1 Org2
Org1 Org3
Org1 Org3 Org4
Org1 Org5
Org1 Org5 Org6
Org1 Org5 Org6 Org7
您可以递归地迭代您的数据。 Prefix代表目前看到的名字列表,data代表你还需要学习的词典列表。
data = [{
"Id": "f035de7f",
"Name": "Org1",
"ParentId": None,
"Children": [{
"Id": "8c18a70d",
"Name": "Org2",
"ParentId": "f035de7f",
"Children": []
}, {
"Id": "b4514099",
"Name": "Org3",
"ParentId": "f035de7f",
"Children": [{
"Id": "8abe58d1",
"Name": "Org4",
"Children": []
}],
}, {
"Id": "8e35bdc3",
"Name": "Org5",
"ParentId": "f035de7f",
"Children": [{
"Id": "331fffbf",
"Name": "Org6",
"ParentId": "8e35bdc3",
"Children": [{
"Id": "3bc3e085",
"Name": "Org7",
"ParentId": "331fffbf",
"Children": []
}],
}],
}],
}]
def flatten(data, prefix):
if not data:
return [prefix]
result = []
for org in data:
name = org["Name"]
result.extend(flatten(org["Children"], prefix + [name]))
return result
print(flatten(data, []))
# [['Org1', 'Org2'], ['Org1', 'Org3', 'Org4'], ['Org1', 'Org5', 'Org6', 'Org7']]
同理,使用yield:
def flatten(data, prefix):
if not data:
yield prefix
for org in data:
name = org["Name"]
yield from flatten(org["Children"], prefix + [name])
print(list(flatten(data, [])))
如果您需要所有部分列表,解决方案更短:
def flatten(data, prefix):
yield prefix
for org in data:
name = org["Name"]
yield from flatten(org["Children"], prefix + [name])
print(list(flatten(data, [])))
# [[], ['Org1'], ['Org1', 'Org2'], ['Org1', 'Org3'], ['Org1', 'Org3', 'Org4'], ['Org1', 'Org5'], ['Org1', 'Org5', 'Org6'], ['Org1', 'Org5', 'Org6', 'Org7']]
一棵json递归树可以有多个根,叶子不必强制指定无效的孩子。例如,这是一棵树,有两个根 'a' 和 'b',节点只有 'level' 数据,即节点深度('children' 是可选的):
json_struct = [
{
'level': 'a0',
'children': [{'level': 'a0.1', 'children':
[{'level': 'a0.1.1', 'children': []}]},
{'level': 'a0.2', 'children': [
{'level': 'a0.2.1', 'children': [
{'level': 'a0.2.1.1'},
{'level': 'a0.2.1.2'},
{'level': 'a0.2.1.3'},
{'level': 'a0.2.1.4', 'children': [{'level': 'a0.2.1.4.1'}, {'level': 'a0.2.1.4.2'}]}
]
}
]
},
{'level': 'a0.3', 'children': []},
{'level': 'a0.4', 'children': [{'level': 'a0.4.1'}, {'level': 'a0.4.2', 'children': []}]}
]
},
{
'level': 'b0',
'children': [{'level': 'b0.1', 'children': [{'level': 'b0.1.1'}]},
{'level': 'b0.2', 'children': [{'level': 'b0.2.1', 'children': [
{'level': 'b0.2.1.1'},
{'level': 'b0.2.1.2'},
{'level': 'b0.2.1.3', 'children': [{'level': 'b0.2.1.3.1'}, {'level': 'b0.2.1.3.2'}]},
{'level': 'b0.2.1.4'}
]
}]},
{'level': 'b0.3'}
]
}
]
代码必须return叶子和完整的分支路径,直到每次离开:
def flatten_json_tree(nodes, lower_nodes_key='children', path=[]):
if not nodes: # void node
yield path # so it is a leaf
for node in nodes: # go on into each branch of the tree
level = node['level'] # get node datas
try:
lower_nodes = node[lower_nodes_key] # search for lower nodes
except KeyError:
lower_nodes = [] # no lower nodes
yield from flatten_json_tree(lower_nodes, lower_nodes_key, path + [level]) # continue to explore the branch until leaf
if __name__ == "__main__":
for path in list(flatten_json_tree(json_struct)):
leaf = path[-1:][0]
complete_path = ''
for node in path:
complete_path += node + (' -> ' if node is not leaf else '')
print("LEAF: {:20s} PATH: {}".format(leaf, complete_path))
它显示:
- 叶子:a0.1.1 路径:a0 -> a0.1 -> a0.1.1
- 叶子:a0.2.1.1 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.1
- 叶子:a0.2.1.2 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.2
- 叶子:a0.2.1.3 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.3
- 叶子:a0.2.1.4.1 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.4 -> a0.2.1.4.1
- 叶子:a0.2.1.4.2 路径:a0 -> a0.2 -> a0.2.1 -> a0.2.1.4 -> a0.2.1.4.2
- 叶子:a0.3 路径:a0 -> a0.3
- 叶子:a0.4.1 路径:a0 -> a0.4 -> a0.4.1
- 叶子:a0.4.2 路径:a0 -> a0.4 -> a0.4.2
- 叶子:b0.1.1 路径:b0 -> b0.1 -> b0.1.1
- 叶子:b0.2.1.1 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.1
- 叶子:b0.2.1.2 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.2
- 叶子:b0.2.1.3.1 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.3 -> b0.2.1.3.1
- 叶子:b0.2.1.3.2 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.3 -> b0.2.1.3.2
- 叶子:b0.2.1.4 路径:b0 -> b0.2 -> b0.2.1 -> b0.2.1.4
- 叶子:b0.3 路径:b0 -> b0.3