正则表达式 select 几行，直到两个连续的新行在 Mac 上不起作用

Question

我需要在以查询 # 开头的行和两个连续的回车 returns 之间提取几行文本（在 500 mb 文档中长度不同）。这是在 Mac 中完成的。例如de文档格式为：

Query #1: 020.1-Bni_its1_2019_envio1set1 

lines I need to extract


Alignments (the following lines I don't need)

xyz
xyx

Query #2: This and the following lines I need. And so on.

“对齐”一词前总是正好有两个回车returns。所以基本上我需要查询 #.: 直到 Alignments.

中的所有行

我尝试了以下正则表达式，但我只恢复了第一行。

ggrep -P 'Query #.*?(?:[\r\n]{2}|\Z)'

我已经在这里 regex101 多次迭代测试了正则表达式，但还没有找到答案。

预期输出为：

Query #1.   Text.

Lines I need to extract

Query #2: This and following lines I need.

Lines I need.

Query #....

提前感谢您的指点。

Answer 1

配合pcregrep，可以用

pcregrep -oM 'Query #.*(?:\R(?!\R{2}).*)*' file.txt > results.txt

这里，

o - 输出匹配的文本
M - 启用跨行匹配（将行结尾放入“模式 space”）
Query #.*(?:\R(?!\R{2}).*)* 匹配
- Query # - 文字
- .* - 该行的其余部分
- (?:\R(?!\R{2}).*)* - 换行符序列 (\R) 的零个或多个序列没有紧接着两个换行符序列 ((?!\R{2}))，然后是该行的其余部分。

测试截图：

Answer 2

来自https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/：

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

在每个 Unix 机器上的任何 shell 中使用任何 awk：

$ awk '/^Query #/{f=1} /^Alignments/{f=0} f' file
Query #1: 020.1-Bni_its1_2019_envio1set1

lines I need to extract


Query #2: This and the following lines I need. And so on.

你没有在你的问题中显示预期的输出，所以我不确定以上是否是你想要的输出，但如果不是，那么无论你做什么，这都是一个微不足道的改变想要。

正则表达式 select 几行，直到两个连续的新行在 Mac 上不起作用

Regex select several lines until two consecutive new lines not working on Mac

regex

macos

grep

text-extraction