用于匹配单词的正则表达式，除非前一行以单词结尾

Question

我有一个包含许多句子的文本，由换行符和任意空格分隔：

Some thing.
  Some other text.
 Some line.
   Some additional text.
Some stuff.
    Some additional text.
Some additional text.

如何只匹配前一行不以 thing 或 stuff 结尾的 Some 单词？

对于上面的例子，我会匹配这些词：

Some thing.           
  Some other text.          <-- skip, previous line ends with "thing."
 [Some] line.
   [Some] additional text.  
[Some] stuff.
    Some additional text.   <-- skip, previous line ends with "stuff."
[Some] additional text.

我试过(?<!thing\.|stuff\.)[\r\n\s]+Some，但我不知道如何在负面回顾中包含空格+换行符？我找到了一些使用 \K 允许 "variable length" 匹配的例子，但我显然根本不明白 \K 是如何匹配的，因为我无法匹配任何东西。

Answer 1

您可以使用 PCRE 动词 (*SKIP)(*F) 让已知抹茶失败并交替使用您的匹配：

(?:thing|stuff)\.\R\s*\w+(*SKIP)(*F)|\bSome\b

RegEx Demo

此处 (?:thing|stuff)\.\R\s*.*(*SKIP)(*F) 将跳过上一行以 thing. 或 stuff. 结尾的匹配并使其失败。在交替的右侧，我们将得到我们的比赛。

Answer 2

您可以使用带有非捕获组的“sacrificial match”来匹配您不想要的内容，然后允许在捕获组中匹配您想要的内容：

/(?:^\s*Some.*(?:thing\.|stuff\.)\s*^\s*Some)|(^\s*Some)/m

Demo

或者，如果你想要第一个和第四个（如评论中所述，你的示例不一致...）

/(?:(?:thing\.|stuff\.)\s*Some)|(^\s*Some)/m

Demo

或者，跳过第一个 Some 并包括第四个：

/(?:(?:thing\.|stuff\.)\s*Some)|((?<=\n)\s*Some)/m

Demo

此方法适用于大多数正则表达式风格。

在这种情况下，负向后视是一个问题，因为后视需要固定宽度。您描述的 \s* 不是固定宽度。

用于匹配单词的正则表达式，除非前一行以单词结尾

Regex for matching a word, unless the previous line ends with a word

regex

negative-lookbehind