使用正则表达式匹配多行

Match multiple lines usign Regex

我收到以下文字。

^0001   HeadOne


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

^0002   HeadTwo


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.


^004    HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

^0004   HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

下面是我用来查找的正则表达式。

@@([\n\r\s]*)(.*)([\n\r\s]+)\^

但这只捕捉到 ^0001^0003,因为它们只有一个段落,但在我的文本中有多个段落内容。

我正在使用 VS 代码,有人可以告诉我如何在 VS 代码或 NPP 中使用 REGEX 捕获此类多参数字符串。

谢谢

我将你的输入数据插入 /tmp/test 并使用 perl 语法得到以下结果

grep -Pzo "@@(?:\s*\n)+((?:.*\s*\n)+)(?:\^.*)*\n+" /tmp/test

这应该是将不以 ^ 开头的段落放入 $1。您可能需要将 \r 添加回其中以使其完美匹配

关于 VSCode 正则表达式的一件奇怪的事情是 \s 不匹配所有换行符。需要使用 [\s\r] 来匹配所有这些。

记住这一点,您想要匹配所有以 @@ 开头然后延伸到行首或字符串末尾的 ^ 的子字符串。

我建议:

@@.*(?:[\n\r]+(?!\s*\^).*)*

regex demo

注意:要仅匹配行首的 @@,请在模式的开头添加 ^^@@.*(?:[\s\r]+(?!\s*\^).*)*

注意 2:以 VSCode 1.29, you need to 开头以在您的正则表达式模式中启用前瞻。

详情

  • ^ - 行首
  • @@ - 文字 @@
  • .* - 行的其余部分(除换行字符外的 0+ 个字符)
  • (?:[\n\r]?(?!\s*\^).*)* - 0 次或多次连续出现:
    • [\n\r]+(?!\s*\^) - 一个或多个换行符后面没有 0+ 空格,然后是 ^ char
    • .* - 行的其余部分

在 Notepad++ 中,使用 ^@@.*(?:\R(?!\h*\^).*)*,其中 \R 匹配换行符,\h* 匹配 0 个或多个水平空格(如果^ 始终是定界线上的第一个字符)。