Python 带列表的嵌套字典(或带字典的列表)到用于 CSV 输出的平面字典列表
Python nested dict with lists (or list with dicts) to list of flat dicts for CSV output
我尝试搜索类似的问题,但我发现没有一个能满足我的需要。
我正在尝试构建一个可以接受 2 个参数的通用函数:
- 对象结构
- (嵌套)路径列表
并将所有给定路径转换为平面字典列表,适合以 CSV 格式输出。
例如,如果我有这样的结构:
structure = {
"configs": [
{
"name": "config_name",
"id": 1,
"parameters": [
{
"name": "param name",
"description": "my description",
"type": "mytype",
},
{
"name": "test",
"description": "description 2",
"type": "myothertype",
"somedata": [
'data',
'data2'
]
}
]
},
{
"name": "config_name2",
"id": 2,
"parameters": [
{
"name": "param name",
"description": "my description",
"type": "mytype2",
"somedata": [
'data',
'data2'
]
},
{
"name": "test",
"description": "description 2",
"type": "myothertype2",
}
]
}
]
}
并传递以下路径列表:
paths = [
'configs.name', # notice the list structure is omitted (i.e it should be 'configs.XXX.name' where XXX is the elem id). This means I want the name entry of every dict that is in the list of configs
'configs.0.id', # similar to the above but this time I want the ID only from the first config
'configs.parameters.type' # I want the type entry of every parameter of every config
]
据此,该函数应生成一个平面词典列表。列表中的每个条目对应于 CSV 的一行。每个平面字典包含所有选定的路径。
例如在这种情况下我应该看到:
result = [
{"configs.name": "config_name", "configs.0.id": 1, "configs.parameters.type": "mytype"},
{"configs.name": "config_name", "configs.0.id": 1, "configs.parameters.type": "myothertype"},
{"configs.name": "config_name2", "configs.parameters.type": "mytype2"},
{"configs.name": "config_name2", "configs.parameters.type": "myothertype2"}
]
它需要能够对传递的任何包含嵌套的字典和列表的结构执行此操作。
您可以构建一个查找函数,根据您的规则搜索值 (get_val
)。此外,此函数接受有效索引的匹配列表 (match
),该列表告诉函数仅遍历字典中具有匹配索引的子列表。这样,搜索功能可以从以前的搜索中“学习”,并且只有 return 个基于以前搜索的子列表定位的值:
structure = {'configs': [{'name': 'config_name', 'id': 1, 'parameters': [{'name': 'param name', 'description': 'my description', 'type': 'mytype'}, {'name': 'test', 'description': 'description 2', 'type': 'myothertype', 'somedata': ['data', 'data2']}]}, {'name': 'config_name2', 'id': 2, 'parameters': [{'name': 'param name', 'description': 'my description', 'type': 'mytype2', 'somedata': ['data', 'data2']}, {'name': 'test', 'description': 'description 2', 'type': 'myothertype2'}]}]}
def get_val(d, rule, match = None, l_matches = []):
if not rule:
yield (l_matches, d)
elif isinstance(d, list):
if rule[0].isdigit() and (match is None or match[0] == int(rule[0])):
yield from get_val(d[int(rule[0])], rule[1:], match=match if match is None else match[1:], l_matches=l_matches+[int(rule[0])])
elif match is None or not rule[0].isdigit():
for i, a in enumerate(d):
if not match or i == match[0]:
yield from get_val(a, rule, match=match if match is None else match[1:], l_matches = l_matches+[i])
else:
yield from get_val(d[rule[0]], rule[1:], match = match, l_matches = l_matches)
def evaluate(paths, struct, val = {}, rule = None):
if not paths:
yield val
else:
k = list(get_val(struct, paths[0].split('.'), match = rule))
if k:
for a, b in k:
yield from evaluate(paths[1:], struct, val={**val, paths[0]:b}, rule = a)
else:
yield from evaluate(paths[1:], struct, val=val, rule = rule)
paths = ['configs.name', 'configs.0.id', 'configs.parameters.type']
print(list(evaluate(paths, structure)))
输出:
[{'configs.name': 'config_name', 'configs.0.id': 1, 'configs.parameters.type': 'mytype'},
{'configs.name': 'config_name', 'configs.0.id': 1, 'configs.parameters.type': 'myothertype'},
{'configs.name': 'config_name2', 'configs.parameters.type': 'mytype2'},
{'configs.name': 'config_name2', 'configs.parameters.type': 'myothertype2'}]
编辑:最好按树中的路径深度对输入路径进行排序:
def get_depth(d, path, c = 0):
if not path:
yield c
elif isinstance(d, dict) or path[0].isdigit():
yield from get_depth(d[path[0] if isinstance(d, dict) else int(path[0])], path[1:], c+1)
else:
yield from [i for b in d for i in get_depth(b, path, c)]
此函数将查找路径目标值所在树中的深度。然后,申请到主代码:
structure = {'configs': [{'id': 1, 'name': 'declaration', 'parameters': [{'int-param': 0, 'description': 'decription1', 'name': 'name1', 'type': 'mytype1'}, {'int-param': 1, 'description': 'description2', 'list-param': ['param0'], 'name': 'name2', 'type': 'mytype2'}]}]}
paths1 = ['configs.id', 'configs.parameters.name', 'configs.parameters.int-param']
paths2 = ['configs.parameters.name', 'configs.id', 'configs.parameters.int-param']
print(list(evaluate(sorted(paths1, key=lambda x:max(get_depth(structure, x.split('.')))), structure)))
print(list(evaluate(sorted(paths2, key=lambda x:max(get_depth(structure, x.split('.')))), structure)))
我尝试搜索类似的问题,但我发现没有一个能满足我的需要。
我正在尝试构建一个可以接受 2 个参数的通用函数:
- 对象结构
- (嵌套)路径列表
并将所有给定路径转换为平面字典列表,适合以 CSV 格式输出。
例如,如果我有这样的结构:
structure = {
"configs": [
{
"name": "config_name",
"id": 1,
"parameters": [
{
"name": "param name",
"description": "my description",
"type": "mytype",
},
{
"name": "test",
"description": "description 2",
"type": "myothertype",
"somedata": [
'data',
'data2'
]
}
]
},
{
"name": "config_name2",
"id": 2,
"parameters": [
{
"name": "param name",
"description": "my description",
"type": "mytype2",
"somedata": [
'data',
'data2'
]
},
{
"name": "test",
"description": "description 2",
"type": "myothertype2",
}
]
}
]
}
并传递以下路径列表:
paths = [
'configs.name', # notice the list structure is omitted (i.e it should be 'configs.XXX.name' where XXX is the elem id). This means I want the name entry of every dict that is in the list of configs
'configs.0.id', # similar to the above but this time I want the ID only from the first config
'configs.parameters.type' # I want the type entry of every parameter of every config
]
据此,该函数应生成一个平面词典列表。列表中的每个条目对应于 CSV 的一行。每个平面字典包含所有选定的路径。
例如在这种情况下我应该看到:
result = [
{"configs.name": "config_name", "configs.0.id": 1, "configs.parameters.type": "mytype"},
{"configs.name": "config_name", "configs.0.id": 1, "configs.parameters.type": "myothertype"},
{"configs.name": "config_name2", "configs.parameters.type": "mytype2"},
{"configs.name": "config_name2", "configs.parameters.type": "myothertype2"}
]
它需要能够对传递的任何包含嵌套的字典和列表的结构执行此操作。
您可以构建一个查找函数,根据您的规则搜索值 (get_val
)。此外,此函数接受有效索引的匹配列表 (match
),该列表告诉函数仅遍历字典中具有匹配索引的子列表。这样,搜索功能可以从以前的搜索中“学习”,并且只有 return 个基于以前搜索的子列表定位的值:
structure = {'configs': [{'name': 'config_name', 'id': 1, 'parameters': [{'name': 'param name', 'description': 'my description', 'type': 'mytype'}, {'name': 'test', 'description': 'description 2', 'type': 'myothertype', 'somedata': ['data', 'data2']}]}, {'name': 'config_name2', 'id': 2, 'parameters': [{'name': 'param name', 'description': 'my description', 'type': 'mytype2', 'somedata': ['data', 'data2']}, {'name': 'test', 'description': 'description 2', 'type': 'myothertype2'}]}]}
def get_val(d, rule, match = None, l_matches = []):
if not rule:
yield (l_matches, d)
elif isinstance(d, list):
if rule[0].isdigit() and (match is None or match[0] == int(rule[0])):
yield from get_val(d[int(rule[0])], rule[1:], match=match if match is None else match[1:], l_matches=l_matches+[int(rule[0])])
elif match is None or not rule[0].isdigit():
for i, a in enumerate(d):
if not match or i == match[0]:
yield from get_val(a, rule, match=match if match is None else match[1:], l_matches = l_matches+[i])
else:
yield from get_val(d[rule[0]], rule[1:], match = match, l_matches = l_matches)
def evaluate(paths, struct, val = {}, rule = None):
if not paths:
yield val
else:
k = list(get_val(struct, paths[0].split('.'), match = rule))
if k:
for a, b in k:
yield from evaluate(paths[1:], struct, val={**val, paths[0]:b}, rule = a)
else:
yield from evaluate(paths[1:], struct, val=val, rule = rule)
paths = ['configs.name', 'configs.0.id', 'configs.parameters.type']
print(list(evaluate(paths, structure)))
输出:
[{'configs.name': 'config_name', 'configs.0.id': 1, 'configs.parameters.type': 'mytype'},
{'configs.name': 'config_name', 'configs.0.id': 1, 'configs.parameters.type': 'myothertype'},
{'configs.name': 'config_name2', 'configs.parameters.type': 'mytype2'},
{'configs.name': 'config_name2', 'configs.parameters.type': 'myothertype2'}]
编辑:最好按树中的路径深度对输入路径进行排序:
def get_depth(d, path, c = 0):
if not path:
yield c
elif isinstance(d, dict) or path[0].isdigit():
yield from get_depth(d[path[0] if isinstance(d, dict) else int(path[0])], path[1:], c+1)
else:
yield from [i for b in d for i in get_depth(b, path, c)]
此函数将查找路径目标值所在树中的深度。然后,申请到主代码:
structure = {'configs': [{'id': 1, 'name': 'declaration', 'parameters': [{'int-param': 0, 'description': 'decription1', 'name': 'name1', 'type': 'mytype1'}, {'int-param': 1, 'description': 'description2', 'list-param': ['param0'], 'name': 'name2', 'type': 'mytype2'}]}]}
paths1 = ['configs.id', 'configs.parameters.name', 'configs.parameters.int-param']
paths2 = ['configs.parameters.name', 'configs.id', 'configs.parameters.int-param']
print(list(evaluate(sorted(paths1, key=lambda x:max(get_depth(structure, x.split('.')))), structure)))
print(list(evaluate(sorted(paths2, key=lambda x:max(get_depth(structure, x.split('.')))), structure)))