正则表达式在多行文本中的下一个匹配项的开头停止

Question

ID_FIRST

After each id come one or more
lines with diverse text

ID_SECOND

The pattern repeats many times

ID_THIRD

That's the end but could be larger

我只想提取每个 ID_* 及其下面的文本，直到下一个 ID_。

看起来很简单

(ID_.+)([\s\S]+)

我已经尝试了几种贪婪和标志的组合，但它要么捕获所有文本直到结束，要么在 ID_ 处停止。我想我缺少一些基本的东西

https://regex101.com/r/Ruy44M/1

Answer 1

[\s\S] 也匹配一个换行符，所以 [\s\S]+ 会匹配到最后。您可以在第 1 组中捕获匹配 ID_ 后跟 1+ 个字符。

然后使用匹配换行符的重复模式在第 2 组中捕获，然后使用否定前瞻 (?! 首先检查该行是否以 ID_:

开头

(ID_.+)((?:\n(?!ID_).*)*)

说明

(ID_.+) 捕获第 1 组 - 匹配 ID_，然后匹配除换行符之外的任何字符 1+ 次
( 捕获组 2
- (?:非捕获组
  - \n(?!ID_).* 匹配换行，断言直接在右边的不是ID_。如果是这种情况，则匹配 0+ 次除换行符以外的任何字符
- )* 关闭非捕获组并重复 0+ 次
) 关闭捕获组

Regex demo

例如：

const regex = /(ID_.+)((?:\n(?!ID_).*)*)/gm;
const str = `ID_FIRST

After each id come one or more
lines with diverse text

ID_SECOND

The pattern repeats many times

ID_THIRD

That's the end but could be larger`;
let m;
while ((m = regex.exec(str)) !== null) {
  if (m.index === regex.lastIndex) {
    regex.lastIndex++;
  }

  console.log("ID: " + m[1]);
  console.log("Text: " + m[2]);
}

正则表达式在多行文本中的下一个匹配项的开头停止

regex stop at the beginning of next match in multiline text

javascript

regex

multiline