pdfgrep 模式到 include/exclude 换行符

Question

pdfgrep 与 grep 类似，只是它作用于页面而不是行。如何制作带有换行符的正则表达式？

我想查找 a，后跟任意数量的字符 除了换行符 ，后跟 b，但 pdfgrep 'a[^\n]*b' 没有' 不起作用，而 pdfgrep 'a.*b' returns 结果跨越多行。（我用 xxd 检查了输出以确认这些换行符确实是 \x0A。）

Answer 1

默认情况下，pdfgrep 使用 POSIX 兼容的正则表达式风格，其中 . 匹配任何字符，包括换行字符。

幸运的是，pdfgrep also supports PCRE 正则表达式风格在 -P 标志的帮助下。在 PCRE 正则表达式中，. 匹配除换行符之外的任何字符。

因此，您可以使用

pdfgrep -P 'a.*b'

pdfgrep pattern to include/exclude linebreak