高效的无序子串匹配

Question

我想匹配一个字符串是否包含在另一个字符串中，而不考虑字符的顺序。例如，如果我有一个字符串 submarine 我希望能够将 marines 检测为匹配项。

我目前处理这个问题的方式是通过列表：

def match(x, y):
    x, y = list(x), list(y)
    for i in x:
        try:
            y.remove(i)
        except ValueError:
            return False
    return True

但是当我尝试匹配很多组合时，这是低效的。

当时我想使用正则表达式，但没有成功。

有什么想法吗？

Answer 1

您可以使用 字符 class [SEARCH_WORD]，其中每个字符都将被独立搜索。通过在其后设置 + 量词，您将查找 1 个或多个字符，并通过添加 \b 单词边界，您将只匹配整个单词：

r'\b[submarine]+\b'

见the regex demo and the IDEONE demo:

import re
s = "I have a string submarine I want to be able to detect marines as a match"
kw = "submarine"
r  = re.compile(r"\b[{0}]+\b".format(kw))
print(r.findall(s))

注意： 如果您的输入可以包含非单词字符，尤其是 ^、]、\ 或 -, 使用 re.escape 转义并使用 r"(?<!\w)[{0}]+(?!\w)".format(re.escape("submarine")).

import re
s = "I have a string ^submarine I want to be able to detect ^marines as a match"
kw = "^submarine"
r  = re.compile(r"(?<!\w)[{0}]+(?!\w)".format(re.escape(kw)))
print(r.findall(s))

见IDEONE demo

高效的无序子串匹配

Efficient unordered substring matching

python

regex

pattern-matching

python-2.5