在Perl中，如何匹配两个连续的Carriage Returns？

Question

嗨，Whosebug 的朋友们，

我在Windows平台；我有一个数据文件，但发生了一些错误，并且（我不知道为什么）"Carriage Return + New Line" 的所有组合都变成了 "Carriage Return + Carriage Return + New Line"、(190128 edit:) 例如:

当以纯文本形式查看文件时，它是：

在十六进制模式下查看同一个文件时，是：

出于实用的考虑，我需要去掉双“0D"s like ".... 30 30 0D 0D 0A 30 30 ....”中多余的“0D”，改成“.... 30 30 0D 0A 30 30 ....”。

190129 编辑： 此外，为了确保我的问题可以重现，我将我的数据文件上传到 GitHub URL（应该下载 &使用前解压；在二进制\十六进制编辑器中第一行可以0D 0D 0A): https://github.com/katyusza/hello_world/blob/master/ram_init.zip

我使用以下 Perl 脚本删除了额外的马车 Return，但令我惊讶的是我的正则表达式根本不起作用！！我的整个代码是（190129 edit：在此处过去的整个 Perl 脚本）：

use warnings            ;
use strict              ;
use File::Basename      ;

#-----------------------------------------------------------
# command line handling, file open \ create
#-----------------------------------------------------------

# Capture input input filename from command line:
my $input_fn = $ARGV[0] or
die "Should provide input file name at command line!\n";

# Parse input file name, and generate output file name:
my ($iname, $ipath, $isuffix) = fileparse($input_fn, qr/\.[^.]*/);
my $output_fn = $iname."_pruneNonPrintable".$isuffix;

# Open input file:
open (my $FIN, "<", $input_fn) or die "Open file error $!\n";

# Create output file:
open (my $FO, ">", $output_fn) or die "Create file error $!\n";


#-----------------------------------------------------------
# Read input file, search & replace, write to output
#-----------------------------------------------------------

# Read all lines in one go:
$/ = undef;

# Read entire file into variable:
my $prune_txt = <$FIN> ;

# Do match & replace:
 $prune_txt =~ s/\x0D\x0D/\x0D/g;          # do NOT work.
# $prune_txt =~ s/\x0d\x0d/\x30/g;          # do NOT work.
# $prune_txt =~ s/\x30\x0d/\x0d/g;          # can work.
# $prune_txt =~ s/\x0d\x0d\x0a/\x0d\x0a/gs; # do NOT work.

# Print end time of processing:
print $FO $prune_txt  ;

# Close files:
close($FIN)     ;
close($FO)      ;

我尽我所能来匹配两个连续的马车 Return，但失败了。任何人都可以指出我的错误，或者告诉我正确的方法吗？提前致谢！

Answer 1

你的第一个正则表达式对我来说似乎工作正常，这意味着其他一些代码可能有问题。请提供一个Minimal, Complete, and Verifiable Example，这意味着包括示例输入数据等。

$ perl -wMstrict -e 'print "Foo\r\r\nBar\r\r\n"' >test.txt
$ hexdump -C test.txt 
00000000  46 6f 6f 0d 0d 0a 42 61  72 0d 0d 0a              |Foo...Bar...|
0000000c
$ cat test.pl 
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dump;

my $filename = 'test.txt';
open my $fh, '<:raw:encoding(ASCII)', $filename or die "$filename: $!";
my $prune_txt = do { local $/; <$fh> }; # slurp file
close $fh;

dd $prune_txt;
$prune_txt =~ s/\x0D\x0D/\x0D/g;
dd $prune_txt;

$ perl test.pl
"Foo\r\r\nBar\r\r\n"
"Foo\r\nBar\r\n"

顺便说一下，我不是很清楚你的文件使用的是哪种编码？在上面的示例中，您可能需要适当调整 :encoding(...) 图层。

Answer 2

在 Windows 上，文件句柄默认有一个 :crlf 层。

该层在读取时将 CR LF 转换为 LF。
该层在写入时将 LF 转换为 CR LF。

解决方案 1：补偿 :crlf 层。

如果您想以适合系统的行结尾结束，您可以使用此解决方案。

# ... read ...      # CR CR LF ⇒ CR LF
s/\r+\n/\n/g;       # CR LF    ⇒ LF
# ... write ...     # LF       ⇒ CR LF

解决方案 2：删除 :crlf 图层。

如果您想无条件地以 CR LF 结束，则可以使用此解决方案。

使用 <:raw 和 >:raw 代替 < 和 > 作为模式。

# ... read ...      # CR CR LF ⇒ CR CR LF
s/\r*\n/\r\n/g;     # CR CR LF ⇒ CR LF
# ... write ...     # CR LF    ⇒ CR LF

在Perl中，如何匹配两个连续的Carriage Returns？

In Perl, how to match two consecutive Carriage Returns?

regex

windows

perl

match

carriage-return