Python 正则表达式查找与其他单词分隔的单词

Question

有没有办法使用re.findall或其他正则表达式方法来计算指定顺序的单词出现次数，由任意数量的单词分隔？

这是一个 "brute force" 实现：

def search_query(query, page):
    count=i=0
    for word in page.split():
            if word == query[i]: i+=1
            if i==len(query): 
                count+=1
                break
    print count

search_query(['hello','kilojoules'],'hello my good friend kilojoules')
1

例如，当查询是hello kilojoules时，我想将hello my good friend kilojoules识别为我查询的一个实例，但对于kilojoules is my good friend则不计入。

这是我对令人满意的正则表达式的幼稚尝试：re.findall('hello\s\Skilojoules','hello my friend kilojoules')。这是行不通的。我认为它会起作用，因为我对这句话的理解是 "find all instances of hello and kilojoules separated by white space or blank space".

Answer 1

我在 re.findall('hello.*?kilojoules','a happy hello my amigo kilojoules now goodbye') 中取得了成功，遵循了 stribizhev 的建议

Answer 2

让我澄清一下：

(?s)\bhello\b.*?\bkilojoules\b

这个正则表达式意味着*匹配整个单词 hello，然后是任何字符甚至空格和换行符，然后是整个单词 kilojoules .

如果您没有换行符，并且不关心整个单词匹配，请使用

hello.*?kilojoules

请注意 \s\S 只是一个空格后跟一个非空格。因此，hello\s\Skilojoules 可以匹配 hello bkilojoules，但不能匹配 hello kilojoules。

Python 正则表达式查找与其他单词分隔的单词

Python regex finding words separated with other words

python

regex

findall