提取在嵌套字典和列表中找到的叶值集,不包括 None
Extract set of leaf values found in nested dicts and lists excluding None
我有一个从 YAML 中读取的嵌套结构,它由嵌套列表 and/or 嵌套字典或在不同嵌套级别的混合组成。可以假定该结构不包含任何递归对象。
如何只从中提取叶值?另外,我不想要任何 None
值。叶值包含字符串,这是我所关心的。考虑到结构的最大深度不足以超过堆栈递归限制,使用递归是可以的。发电机也可以选择。
存在处理扁平化列表或字典的类似问题,但不是两者的混合。或者,如果扁平化字典,它们也会 return 我并不真正需要的扁平化键,并且有名称冲突的风险。
我试过 more_itertools.collapse
但它的示例仅显示它适用于嵌套列表,而不适用于字典和列表的混合。
示例输入
struct1 = {
"k0": None,
"k1": "v1",
"k2": ["v0", None, "v1"],
"k3": ["v0", ["v1", "v2", None, ["v3"], ["v4", "v5"], []]],
"k4": {"k0": None},
"k5": {"k1": {"k2": {"k3": "v3", "k4": "v6"}, "k4": {}}},
"k6": [{}, {"k1": "v7"}, {"k2": "v8", "k3": "v9", "k4": {"k5": {"k6": "v10"}, "k7": {}}}],
"k7": {
"k0": [],
"k1": ["v11"],
"k2": ["v12", "v13"],
"k3": ["v14", ["v15"]],
"k4": [["v16"], ["v17"]],
"k5": ["v18", ["v19", "v20", ["v21", "v22", []]]],
},
}
struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]
预期产出
struct1_leaves = {f"v{i}" for i in range(23)}
struct2_leaves = {f"{s}{s}" for s in "abcdefg"}
这是一个简单的参考解决方案,它使用递归为问题中包含的示例输入生成预期输出。
from typing import Any, Set
def leaves(struct: Any) -> Set[Any]:
"""Return a set of leaf values found in nested dicts and lists excluding None values."""
# Ref:
values = set()
if isinstance(struct, dict):
for sub_struct in struct.values():
values.update(leaves(sub_struct))
elif isinstance(struct, list):
for sub_struct in struct:
values.update(leaves(sub_struct))
elif struct is not None:
values.add(struct)
return values
另一种可能性是使用带递归的生成器:
struct1 = {'k0': None, 'k1': 'v1', 'k2': ['v0', None, 'v1'], 'k3': ['v0', ['v1', 'v2', None, ['v3'], ['v4', 'v5'], []]], 'k4': {'k0': None}, 'k5': {'k1': {'k2': {'k3': 'v3', 'k4': 'v6'}, 'k4': {}}}, 'k6': [{}, {'k1': 'v7'}, {'k2': 'v8', 'k3': 'v9', 'k4': {'k5': {'k6': 'v10'}, 'k7': {}}}], 'k7': {'k0': [], 'k1': ['v11'], 'k2': ['v12', 'v13'], 'k3': ['v14', ['v15']], 'k4': [['v16'], ['v17']], 'k5': ['v18', ['v19', 'v20', ['v21', 'v22', []]]]}}
def flatten(d):
for i in getattr(d, 'values', lambda :d)():
if isinstance(i, str):
yield i
elif i is not None:
yield from flatten(i)
print(set(flatten(struct1)))
输出:
{'v10', 'v9', 'v8', 'v7', 'v0', 'v18', 'v16', 'v1', 'v21', 'v11', 'v14', 'v15', 'v12', 'v13', 'v4', 'v2', 'v5', 'v20', 'v6', 'v19', 'v3', 'v22', 'v17'}
struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]
print(set(flatten(struct2)))
输出:
{'cc', 'ff', 'dd', 'gg', 'bb', 'ee', 'aa'}
这是对 的改编,使用内部函数和单个 set
。它还使用递归为问题中包含的样本输入生成预期输出。它避免了通过整个调用堆栈传递每个叶子。
from typing import Any, Set
def leaves(struct: Any) -> Set[Any]:
"""Return a set of leaf values found in nested dicts and lists excluding None values."""
# Ref:
values = set()
def add_leaves(struct_: Any) -> None:
if isinstance(struct_, dict):
for sub_struct in struct_.values():
add_leaves(sub_struct)
elif isinstance(struct_, list):
for sub_struct in struct_:
add_leaves(sub_struct)
elif struct_ is not None:
values.add(struct_)
add_leaves(struct)
return values
我有一个从 YAML 中读取的嵌套结构,它由嵌套列表 and/or 嵌套字典或在不同嵌套级别的混合组成。可以假定该结构不包含任何递归对象。
如何只从中提取叶值?另外,我不想要任何 None
值。叶值包含字符串,这是我所关心的。考虑到结构的最大深度不足以超过堆栈递归限制,使用递归是可以的。发电机也可以选择。
存在处理扁平化列表或字典的类似问题,但不是两者的混合。或者,如果扁平化字典,它们也会 return 我并不真正需要的扁平化键,并且有名称冲突的风险。
我试过 more_itertools.collapse
但它的示例仅显示它适用于嵌套列表,而不适用于字典和列表的混合。
示例输入
struct1 = {
"k0": None,
"k1": "v1",
"k2": ["v0", None, "v1"],
"k3": ["v0", ["v1", "v2", None, ["v3"], ["v4", "v5"], []]],
"k4": {"k0": None},
"k5": {"k1": {"k2": {"k3": "v3", "k4": "v6"}, "k4": {}}},
"k6": [{}, {"k1": "v7"}, {"k2": "v8", "k3": "v9", "k4": {"k5": {"k6": "v10"}, "k7": {}}}],
"k7": {
"k0": [],
"k1": ["v11"],
"k2": ["v12", "v13"],
"k3": ["v14", ["v15"]],
"k4": [["v16"], ["v17"]],
"k5": ["v18", ["v19", "v20", ["v21", "v22", []]]],
},
}
struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]
预期产出
struct1_leaves = {f"v{i}" for i in range(23)}
struct2_leaves = {f"{s}{s}" for s in "abcdefg"}
这是一个简单的参考解决方案,它使用递归为问题中包含的示例输入生成预期输出。
from typing import Any, Set
def leaves(struct: Any) -> Set[Any]:
"""Return a set of leaf values found in nested dicts and lists excluding None values."""
# Ref:
values = set()
if isinstance(struct, dict):
for sub_struct in struct.values():
values.update(leaves(sub_struct))
elif isinstance(struct, list):
for sub_struct in struct:
values.update(leaves(sub_struct))
elif struct is not None:
values.add(struct)
return values
另一种可能性是使用带递归的生成器:
struct1 = {'k0': None, 'k1': 'v1', 'k2': ['v0', None, 'v1'], 'k3': ['v0', ['v1', 'v2', None, ['v3'], ['v4', 'v5'], []]], 'k4': {'k0': None}, 'k5': {'k1': {'k2': {'k3': 'v3', 'k4': 'v6'}, 'k4': {}}}, 'k6': [{}, {'k1': 'v7'}, {'k2': 'v8', 'k3': 'v9', 'k4': {'k5': {'k6': 'v10'}, 'k7': {}}}], 'k7': {'k0': [], 'k1': ['v11'], 'k2': ['v12', 'v13'], 'k3': ['v14', ['v15']], 'k4': [['v16'], ['v17']], 'k5': ['v18', ['v19', 'v20', ['v21', 'v22', []]]]}}
def flatten(d):
for i in getattr(d, 'values', lambda :d)():
if isinstance(i, str):
yield i
elif i is not None:
yield from flatten(i)
print(set(flatten(struct1)))
输出:
{'v10', 'v9', 'v8', 'v7', 'v0', 'v18', 'v16', 'v1', 'v21', 'v11', 'v14', 'v15', 'v12', 'v13', 'v4', 'v2', 'v5', 'v20', 'v6', 'v19', 'v3', 'v22', 'v17'}
struct2 = ["aa", "bb", "cc", ["dd", "ee", ["ff", "gg"], None, []]]
print(set(flatten(struct2)))
输出:
{'cc', 'ff', 'dd', 'gg', 'bb', 'ee', 'aa'}
这是对 set
。它还使用递归为问题中包含的样本输入生成预期输出。它避免了通过整个调用堆栈传递每个叶子。
from typing import Any, Set
def leaves(struct: Any) -> Set[Any]:
"""Return a set of leaf values found in nested dicts and lists excluding None values."""
# Ref:
values = set()
def add_leaves(struct_: Any) -> None:
if isinstance(struct_, dict):
for sub_struct in struct_.values():
add_leaves(sub_struct)
elif isinstance(struct_, list):
for sub_struct in struct_:
add_leaves(sub_struct)
elif struct_ is not None:
values.add(struct_)
add_leaves(struct)
return values