优化查找以任何顺序匹配子字符串中的字符的字符串?
Optimizing finding a string that matches the characters in a substring in any order?
假设列表如下:
list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']
和一个子字符串
to_find = 'seos'
我想在 list_of_strings
中找到以下字符串:
- 与
to_find
的长度相同
- 具有与
to_find
相同的字符(不考虑字符的顺序)
list_of_strings
的输出应该是 'sseo', 'oess']
(因为它有来自 to_find
的所有字母并且长度都是 4)
我有:
import itertools
list_of_strings = [string for string in list_of_strings if len(string) == len(to_find)]
result = [string for string in list_of_strings if any("".join(perm) in string for perm in itertools.permutations(to_find))]
找出运行我做的代码需要多长时间
import timeit
timeit.timeit("[string for string in list_of_strings if any(''.join(perm) in string for perm in itertools.permutations(to_find))]",
setup='from __main__ import list_of_strings, to_find', number=100000)
该过程需要一段时间才能给出输出。我猜这是因为使用了 itertools.permutations
.
有没有办法让这段代码更有效率?
谢谢
这应该可行,因为 Counter
创建了一个 dict-like 来计算每个字符串中的字符数,目的是匹配字母及其计数,而不考虑它们的顺序。
from collections import Counter
to_find_counter = Counter(to_find)
# go through the list and check if the Counter is the same as the Counter of to_find
[x for x in list_of_strings if Counter(x)==to_find_counter]
['sseo', 'oess']
如果顺序无关紧要,您可以只对字符串进行排序并比较结果列表:
list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']
to_find = sorted('seos')
matches = [word for word in list_of_strings if sorted(word) == to_find]
假设列表如下:
list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']
和一个子字符串
to_find = 'seos'
我想在 list_of_strings
中找到以下字符串:
- 与
to_find
的长度相同
- 具有与
to_find
相同的字符(不考虑字符的顺序)
list_of_strings
的输出应该是 'sseo', 'oess']
(因为它有来自 to_find
的所有字母并且长度都是 4)
我有:
import itertools
list_of_strings = [string for string in list_of_strings if len(string) == len(to_find)]
result = [string for string in list_of_strings if any("".join(perm) in string for perm in itertools.permutations(to_find))]
找出运行我做的代码需要多长时间
import timeit
timeit.timeit("[string for string in list_of_strings if any(''.join(perm) in string for perm in itertools.permutations(to_find))]",
setup='from __main__ import list_of_strings, to_find', number=100000)
该过程需要一段时间才能给出输出。我猜这是因为使用了 itertools.permutations
.
有没有办法让这段代码更有效率?
谢谢
这应该可行,因为 Counter
创建了一个 dict-like 来计算每个字符串中的字符数,目的是匹配字母及其计数,而不考虑它们的顺序。
from collections import Counter
to_find_counter = Counter(to_find)
# go through the list and check if the Counter is the same as the Counter of to_find
[x for x in list_of_strings if Counter(x)==to_find_counter]
['sseo', 'oess']
如果顺序无关紧要,您可以只对字符串进行排序并比较结果列表:
list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']
to_find = sorted('seos')
matches = [word for word in list_of_strings if sorted(word) == to_find]