从文件中删除前导字节，然后将剩余字节写入输出

Question

我需要在（文本）文件中找到第一个子字符串，drop/cut 前导字节，将剩余字节写入新文件。我尝试了 SED、AWK、CUT，但由于结果不太好而迷失了方向。听起来很简单。这应该在 .sh cmdline 脚本中工作。

输入文件可能有换行符或所有内容都在一行中，因此要找到 <?xml 标记应该在字符或字节级别工作。前导字节是随机的，任意长度。

输入文件： something I want to drop<?xml............to the end of file</root>

输出文件：<?xml............to the end of file</root>

Answer 1

与perl

perl -0777 -pe 's/.*?(?=<\?xml)//s' ip.txt

-0777 将导致整个文件被读取为单个字符串。 s 标志将允许 . 匹配换行符。 (?=<\?xml) 将先行匹配 <\xml 并因此删除此字符串之前出现的所有字符。

要就地保存更改，请使用 perl -0777 -i -pe

Answer 2

sed -n '/.*<?xml/,${s//<?xml/;p}' file

从 xml 行到结束行 ($)，剥离前导，然后打印。

Remove leading bytes from the file then write remaining bytes to otput