用于将字符串与空格和单词匹配的正则表达式

Question

我有以下字符串：

the quick brown fox abc(1)(x)

使用以下正则表达式：

(?i)(\s{1})(abc\(1\)\([x|y]\))

输出为

abc(1)(x)

这是预料之中的，但是，我似乎做不到：

使用\W \w \d \D等提取1个以上space
合并量词以添加更多 spaces。

我想要以下输出：

the quick brown fox abc(1)(x)

从主要查找“abc(1)(x)”开始，我希望查找的两边最多有 5 个单词。我的假设是 spaces 会划定一个词。

编辑 1：

两边的 5 个词在以后的例子中是未知的。字符串可能是：

cat with a black hat is abc(1)(x) the quick brown fox jumps over the lazy dog.

在这种情况下，所需的输出将是：

with a black hat is abc(1)(x) the quick brown fox jumps

编辑 2：

编辑了第一个示例中的预期输出并添加了“最多”5 个字

Answer 1

(?:[0-9A-Za-z_]+[^0-9A-Za-z_]+){0,5}abc\(1\)\([xy]\)(?:[^0-9A-Za-z_]+[0-9A-Za-z_]+){0,5}

请注意，我已将 \w+ 更改为 [0-9A-Za-z_]+，将 \W+ 更改为 [^0-9A-Za-z_]+，因为根据您的语言环境/Unicode 设置，\W 和 \w 可能不会按照您在 Python.

中预期的方式行事

另请注意，我并没有专门寻找空格，只是 "non-word characters" 这可能会更好地处理引号字符等边缘情况。但不管怎样，这应该能让你完成大部分工作。

顺便说一句：你称之为 "lookaround" - 实际上它与 "regex lookaround" 正则表达式功能无关。

Answer 2

如果我正确理解你的要求，你想做这样的事情：

(?:\w+[ ]){0,5}(abc\(1\)\([xy]\))(?:[ ]\w+){0,5}

Demo.

细分：

(?:               # Start of a non-capturing group.
    \w+           # Any word character repeated one or more times (basically, a word).
    [ ]           # Matches a space character literally.
)                 # End of the non-capturing group.
{0,5}             # Match the previous group between 0 and 5 times.
(                 # Start of the first capturing group.
    abc\(1\)      # Matches "abc(1)" literally.
    \([xy]\)      # Matches "(x)" or "(y)". You don't need "|" inside a character class.
)                 # End of the capturing group.
(?:[ ]\w+){0,5}   # Same as the non-capturing group above but the space is before the word.

备注：

要使模式不区分大小写，您可以使用 (?i) 开始它，就像您已经在做的那样，或者使用 re.IGNORECASE flag.
如果你想支持不以 space 分隔的单词，你可以将 [ ] 替换为 \W+（这意味着 non-word 个字符）或一个字符class 其中包括您要支持的所有标点字符（例如，[.,;?! ]）。

用于将字符串与空格和单词匹配的正则表达式

RegEx for matching strings with spaces and words

python

regex

regex-group

regex-lookarounds