在 python 中使用正则表达式搜索包含前导或尾随特殊字符（例如 - 和 =）的整个单词

Question

我想知道字符串（单词）在句子中的位置。我正在使用下面的功能。此函数对大多数单词都非常有效，但对于句子 I have a lot of GLC-SX-MM= in my inventory list 中的字符串 GLC-SX-MM=，无法获得匹配。我试过转义 - 和 = 但不起作用。任何的想法？我不能用 spaces 拆分句子，因为有时我会用 space.

分隔组合词

import re 

def get_start_end(self, sentence, key):
        r = re.compile(r'\b(%s)\b' % key, re.I)
        m = r.search(question)
        start = m.start()
        end = m.end()
        return start, end

Answer 1

查找文字字符串时需要对键进行转义，并确保使用明确的 (?<!\w) 和 (?!\w) 边界：

import re 

def get_start_end(self, sentence, key):
    r = re.compile(r'(?<!\w){}(?!\w)'.format(re.escape(key)), re.I)
    m = r.search(question)
    start = m.start()
    end = m.end()
    return start, end

r'(?<!\w){}(?!\w)'.format(re.escape(key)) 将使用 abc.def= 关键字构建一个类似于 (?<!\w)abc\.def\=(?!\w) 的正则表达式，而 (?<!\w) 将在左侧紧邻单词 char 的情况下匹配失败关键字和 (?!\w) 如果关键字右侧紧邻单词 char，则任何匹配都将失败。

Answer 2

这不是实际答案，但有助于解决问题。

您可以动态获取模式进行调试。

import re 

def get_start_end(sentence, key):
        r = re.compile(r'\b(%s)\b' % key, re.I)
        print(r.pattern)

sentence = "foo-bar is not foo=bar"

get_start_end(sentence, 'o-')
get_start_end(sentence, 'o=')

\b(o-)\b
\b(o=)\b

然后您可以尝试手动匹配模式，如使用 https://regex101.com/（如果匹配）。

在 python 中使用正则表达式搜索包含前导或尾随特殊字符（例如 - 和 =）的整个单词

Searching for a whole word that contains leading or trailing special characters like - and = using regex in python

python

regex

nsregularexpression