删除模式前的空行。怎么了？目前正在使用 Perl 但对 sed/AWK 开放

Question

在一个长文档中，我想有选择地删除确切字符串 \begin{enumerate*} 之前的特定换行符，最好在 bash 或 zsh 中使用单行。

也就是我要转换test.tex:

Text in paragraphs.

More text

\begin{enumerate*} \item thing

到

Text in paragraphs.

More text \begin{enumerate*} \item thing

像

这样的单线

cat test.tex | perl -p -e 's/\n(?=(\begin\{enumerate\*\}))/ /'

或

cat test.tex | perl -p -e 's/\n\begin\{enumerate\*\}/\begin{enumerate*}/'

但我一定是遗漏了什么，因为它没有做任何改变。

我在这里也显然不需要正则表达式。如果有一种方法可以使用精确的字符串匹配而不是正则表达式来做到这一点，我宁愿使用这种方式。例如，在 R 中，我可以使用 sub("\n\begin{enumerate*}", "\begin{enumerate*}", fixed = TRUE).

Answer 1

perl -p 处理一个文件一个字符串一个字符串，所以你不能指望这个正则表达式匹配。

我会推荐

perl -e '$text = join "", <>; $text =~ s/your_regex_here//; print $text' test.txt

注意它会将整个文件加载到内存中。

另外，如果你想立即修改文件，你不能只说> test.txt，见this question。

Answer 2

您可以在 Perl 中使用 -0（数字零）开关来指定行分隔符。传统上-0777用于读取整个文件

您还需要注意搜索字符串中的正则表达式元字符。像 *、{、} 和 \ 这样的字符在正则表达式模式中意味着一些特殊的东西，你应该转义它们——通常用 \Q ... \E构造

考虑到这些要点，这应该适合你

perl -0777 -pe' s/\n+(?=\Q\begin{enumerate*}\E)/ / ' myfile

Answer 3

我找到了一个使用 sed（this page 上的第 25 号）的解决方案，它不会将整个文件读入内存：

sed -i bak -n '/^\begin{enumerate\*}/{x;d;};1h;1!{x;p;};${x;p;}' test.tex

缺点是这实际上并没有连接两条线；相反，它会产生

Text in paragraphs.

More text
\begin{enumerate*} \item thing

这足以满足我的需要（latex 将单个换行符视为与常规空格相同）

Delete blank line before a pattern. What's wrong? Currently using Perl but open to sed/AWK