在包含子字符串的字典中递归搜索路径
Recursively search for paths in a dictionary containing a sub-string
我正在尝试确定使用正则表达式搜索嵌套字典的最快方法,以及 return 每次出现该字符串的路径。我只对字符串值感兴趣,对其他可能不是明确为字符串的值不感兴趣。递归不是我的强项。这是一个示例 JSON,假设我正在寻找包含 'blah'.
的所有绝对路径
d = {'id': 'abcde',
'key1': 'blah',
'key2': 'blah blah',
'nestedlist': [{'id': 'qwerty',
'nestednestedlist': [{'id': 'xyz', 'keyA': 'blah blah blah'},
{'id': 'fghi', 'keyZ': 'blah blah blah'}],
'anothernestednestedlist': [{'id': 'asdf', 'keyQ': 'blah blah'},
{'id': 'yuiop', 'keyW': 'blah'}]}]}
我找到了以下代码片段,但未能将其设为 return 路径,而不仅仅是打印它们。除此之外,添加“如果值是一个字符串并且包含 re.search() 然后将路径附加到列表”应该不会太难。
def search_dict(v, prefix=''):
if isinstance(v, dict):
for k, v2 in v.items():
p2 = "{}['{}']".format(prefix, k)
search_dict(v2, p2)
elif isinstance(v, list):
for i, v2 in enumerate(v):
p2 = "{}[{}]".format(prefix, i)
search_dict(v2, p2)
else:
print('{} = {}'.format(prefix, repr(v)))
您只需要初始化一个输出列表,append
个在当前调用中找到的元素,extend
它由递归调用返回的结果。
试试这个:
def search_dict(v, prefix=''):
result = []
if isinstance(v, dict):
for k, v2 in v.items():
p2 = "{}['{}']".format(prefix, k)
result.extend(search_dict(v2, p2))
elif isinstance(v, list):
for i, v2 in enumerate(v):
p2 = "{}[{}]".format(prefix, i)
result.extend(search_dict(v2, p2))
else:
result.append('{} = {}'.format(prefix, repr(v)))
return result
Adam.Er8 是准确的,我只是想更明确地回答这个问题:
def search_dict(v, re_term, prefix=''):
re_term = re.compile(re_term)
result = []
if isinstance(v, dict):
for k, v2 in v.items():
p2 = "{}['{}']".format(prefix, k)
result.extend(search_dict(v2, re_term, prefix = p2))
elif isinstance(v, list):
for i, v2 in enumerate(v):
p2 = "{}[{}]".format(prefix, i)
result.extend(search_dict(v2, re_term, prefix = p2))
elif isinstance(v, str) and re.search(re_term,v):
result.append(prefix)
return result
这里的两个答案都急切地计算结果,在返回第一个(如果有的话)可用结果之前耗尽整个输入字典。我们可以使用 yield from
来编码更 Pythonic 的程序 -
def search_substr(t = {}, q = ""):
def loop(t, path):
if isinstance(t, dict):
for k, v in t.items():
yield from loop(v, (*path, k)) # <- recur
elif isinstance(t, list):
for k, v in enumerate(t):
yield from loop(v, (*path, k)) # <- recur
elif isinstance(t, str):
if q in t:
yield path, t # <- output a match
yield from loop(t, ()) # <- init
for (path, value) in search_substr(d, "blah"):
print(path, value)
结果-
('key1',) blah
('key2',) blah blah
('nestedlist', 0, 'nestednestedlist', 0, 'keyA') blah blah blah
('nestedlist', 0, 'nestednestedlist', 1, 'keyZ') blah blah blah
('nestedlist', 0, 'anothernestednestedlist', 0, 'keyQ') blah blah
('nestedlist', 0, 'anothernestednestedlist', 1, 'keyW') blah
注意,我们使用 q in t
测试目标 t
中的子字符串 q
。如果你真的想为此使用正则表达式 -
from re import compile
def search_re(t = {}, q = ""):
def loop(t, re, path): # <- add re
if isinstance(t, dict):
for k, v in t.items():
yield from loop(v, re, (*path, k)) # <- carry re
elif isinstance(t, list):
for k, v in enumerate(t):
yield from loop(v, re, (*path, k)) # <- carry re
elif isinstance(t, str):
if re.search(t): # <- re.search
yield path, t
yield from loop(t, compile(q), ()) # <- compile q
现在我们可以使用正则表达式进行搜索 -
for (path, value) in search_re(d, r"[abhl]{4}"):
print(path, value)
结果-
('key1',) blah
('key2',) blah blah
('nestedlist', 0, 'nestednestedlist', 0, 'keyA') blah blah blah
('nestedlist', 0, 'nestednestedlist', 1, 'keyZ') blah blah blah
('nestedlist', 0, 'anothernestednestedlist', 0, 'keyQ') blah blah
('nestedlist', 0, 'anothernestednestedlist', 1, 'keyW') blah
让我们使用不同的查询尝试另一个搜索 -
for (path, value) in search_re(d, r"[dfs]{3}"):
print(path, value)
('nestedlist', 0, 'anothernestednestedlist', 0, 'id') asdf
最后,当查询不匹配时,search_substr
和 search_re
什么也不产生 -
print(list(search_re(d, r"zzz")))
# []
我正在尝试确定使用正则表达式搜索嵌套字典的最快方法,以及 return 每次出现该字符串的路径。我只对字符串值感兴趣,对其他可能不是明确为字符串的值不感兴趣。递归不是我的强项。这是一个示例 JSON,假设我正在寻找包含 'blah'.
的所有绝对路径d = {'id': 'abcde',
'key1': 'blah',
'key2': 'blah blah',
'nestedlist': [{'id': 'qwerty',
'nestednestedlist': [{'id': 'xyz', 'keyA': 'blah blah blah'},
{'id': 'fghi', 'keyZ': 'blah blah blah'}],
'anothernestednestedlist': [{'id': 'asdf', 'keyQ': 'blah blah'},
{'id': 'yuiop', 'keyW': 'blah'}]}]}
我找到了以下代码片段,但未能将其设为 return 路径,而不仅仅是打印它们。除此之外,添加“如果值是一个字符串并且包含 re.search() 然后将路径附加到列表”应该不会太难。
def search_dict(v, prefix=''):
if isinstance(v, dict):
for k, v2 in v.items():
p2 = "{}['{}']".format(prefix, k)
search_dict(v2, p2)
elif isinstance(v, list):
for i, v2 in enumerate(v):
p2 = "{}[{}]".format(prefix, i)
search_dict(v2, p2)
else:
print('{} = {}'.format(prefix, repr(v)))
您只需要初始化一个输出列表,append
个在当前调用中找到的元素,extend
它由递归调用返回的结果。
试试这个:
def search_dict(v, prefix=''):
result = []
if isinstance(v, dict):
for k, v2 in v.items():
p2 = "{}['{}']".format(prefix, k)
result.extend(search_dict(v2, p2))
elif isinstance(v, list):
for i, v2 in enumerate(v):
p2 = "{}[{}]".format(prefix, i)
result.extend(search_dict(v2, p2))
else:
result.append('{} = {}'.format(prefix, repr(v)))
return result
Adam.Er8 是准确的,我只是想更明确地回答这个问题:
def search_dict(v, re_term, prefix=''):
re_term = re.compile(re_term)
result = []
if isinstance(v, dict):
for k, v2 in v.items():
p2 = "{}['{}']".format(prefix, k)
result.extend(search_dict(v2, re_term, prefix = p2))
elif isinstance(v, list):
for i, v2 in enumerate(v):
p2 = "{}[{}]".format(prefix, i)
result.extend(search_dict(v2, re_term, prefix = p2))
elif isinstance(v, str) and re.search(re_term,v):
result.append(prefix)
return result
这里的两个答案都急切地计算结果,在返回第一个(如果有的话)可用结果之前耗尽整个输入字典。我们可以使用 yield from
来编码更 Pythonic 的程序 -
def search_substr(t = {}, q = ""):
def loop(t, path):
if isinstance(t, dict):
for k, v in t.items():
yield from loop(v, (*path, k)) # <- recur
elif isinstance(t, list):
for k, v in enumerate(t):
yield from loop(v, (*path, k)) # <- recur
elif isinstance(t, str):
if q in t:
yield path, t # <- output a match
yield from loop(t, ()) # <- init
for (path, value) in search_substr(d, "blah"):
print(path, value)
结果-
('key1',) blah
('key2',) blah blah
('nestedlist', 0, 'nestednestedlist', 0, 'keyA') blah blah blah
('nestedlist', 0, 'nestednestedlist', 1, 'keyZ') blah blah blah
('nestedlist', 0, 'anothernestednestedlist', 0, 'keyQ') blah blah
('nestedlist', 0, 'anothernestednestedlist', 1, 'keyW') blah
注意,我们使用 q in t
测试目标 t
中的子字符串 q
。如果你真的想为此使用正则表达式 -
from re import compile
def search_re(t = {}, q = ""):
def loop(t, re, path): # <- add re
if isinstance(t, dict):
for k, v in t.items():
yield from loop(v, re, (*path, k)) # <- carry re
elif isinstance(t, list):
for k, v in enumerate(t):
yield from loop(v, re, (*path, k)) # <- carry re
elif isinstance(t, str):
if re.search(t): # <- re.search
yield path, t
yield from loop(t, compile(q), ()) # <- compile q
现在我们可以使用正则表达式进行搜索 -
for (path, value) in search_re(d, r"[abhl]{4}"):
print(path, value)
结果-
('key1',) blah
('key2',) blah blah
('nestedlist', 0, 'nestednestedlist', 0, 'keyA') blah blah blah
('nestedlist', 0, 'nestednestedlist', 1, 'keyZ') blah blah blah
('nestedlist', 0, 'anothernestednestedlist', 0, 'keyQ') blah blah
('nestedlist', 0, 'anothernestednestedlist', 1, 'keyW') blah
让我们使用不同的查询尝试另一个搜索 -
for (path, value) in search_re(d, r"[dfs]{3}"):
print(path, value)
('nestedlist', 0, 'anothernestednestedlist', 0, 'id') asdf
最后,当查询不匹配时,search_substr
和 search_re
什么也不产生 -
print(list(search_re(d, r"zzz")))
# []