如何使用 Perl 将不可打印的 ascii 字符转换为可读文本
How can I translate non printable ascii chars to readable text with Perl
我正在尝试使用 Perl 5.28 和 Linux (Debian 8) 在 Linux 设备上测试通过 USB 连接的一些探测器。当我读出探测器的大文件缓冲区时,经常会出现 none 可读的 ASCII 符号,如 [=13=]
或 \x02
。我想将这些标志翻译成可读的标记文本。我写了一个小子程序,但在我看来,对于大型翻译列表来测试每个条目似乎有点笨拙。有更好的方法吗?
示例脚本
#!/usr/bin/env perl -w
# test-escape.pl --- test none readable chars
use strict;
sub escBuf() {
my $buf = shift;
my @numNul = $buf =~ /[=10=]/g;
my @numCR = $buf =~ /\r/g;
$buf =~ s/\r/\n/g;
$buf =~ s/\x00/<NUL>/g;
$buf =~ s/\x01/<SOH>/g;
$buf =~ s/\x02/<STX>/g;
$buf =~ s/\x03/<ETX>/g;
$buf =~ s/\x04/<EOT>/g;
$buf =~ s/\x05/<ENQ>/g;
$buf =~ s/\x06/<ACK>/g;
$buf =~ s/\x07/<BEL>/g;
$buf =~ s/\x08/<BS>/g;
$buf =~ s/\x0B/<VT>/g;
$buf =~ s/\x0C/<FF>/g;
$buf =~ s/\x0E/<SO>/g;
$buf =~ s/\x0F/<SI>/g;
my $numNUL = @numNul;
my $numCR = @numCR;
return ($buf, $numNUL, $numCR);
}
# Buffer example
my $buffer = "\x01\r\x02This is a test with\r\n ".
"sometimes qiurks [=10=] inside \x0C stuff [=10=] and regular \x03\r\x04";
# Translate output
my ($out, $numNUL, $numCR) = &escBuf($buffer);
# Not printed correctly due to [=10=]
# print "ORG.TEXT: '$buffer' \n\n";
# Result of the translation
print "ESC.TEXT: '$out' \n\n";
print "NUM.NUL: $numNUL\n";
print "NUM.CR: $numCR\n\n";
结果
/usr/bin/env perl -w "test-escape.pl"
ESC.TEXT: '<SOH>
<STX>This is a test with
sometimes qiurks <NUL> inside <FF> stuff <NUL> and regular <ETX>
<EOT>'
NUM.NUL: 2
NUM.CR: 3
编辑:采纳了
提出的解决方案的代码
#!/usr/bin/env perl -w
# test-escape.pl --- test none readable chars
use strict;
# Dictionary of non printable signs
my %NONE_ASC_DICT = (
"\x00" => "NUL", "\x01" => "SOH", "\x02" => "STX", "\x03" => "ETX",
"\x04" => "EOT", "\x05" => "ENQ", "\x06" => "ACK", "\x07" => "BEL",
"\x08" => "BS",
# Essenital for parsing "\x09" => "TAB" "\x0a" => "LF"
"\x0b" => "VT", "\x0c" => "FF", "\x0d" => "CR",
"\x0e" => "SO", "\x0f" => "SI",
"\x10" => "DLE",
"\x11" => "DC1", "\x12" => "DC2", "\x13" => "DC3", "\x14" => "DC4",
"\x15" => "NAK", "\x16" => "SYN", "\x17" => "ETB", "\x18" => "CAN",
"\x19" => "EM", "\x1A" => "SUB", "\x1B" => "ESC", "\x1C" => "FS",
"\x1D" => "GS", "\x1E" => "RS", "\x1F" => "US", "\x7F" => "DEL",
);
# Mapping of the entries and corresponding predefined REGEX
my $NONE_ASC_CLASS = join "", map quotemeta, keys(%NONE_ASC_DICT);
my $NONE_ASC_REGEX = qr/([$NONE_ASC_CLASS])/;
# Translator subroutine
sub escBuffer() {
my ($buf, $dict, $regex, $prefix, $suffix) = @_;
# Set default sprefix suffix strings if not present
$prefix //= '<'; $suffix //= '>';
# Count the real quirks
my @numNUL = $buf =~ /[=12=]/g;
my $numNUL = @numNUL;
# Clean up mixed UNIX / DOS context
$buf =~ s/\r\n/\n/g;
$buf =~ s/\r/\n/g; # translate all remaining \r to \n
# Calc resulting number of lines
my @numLF = $buf =~ /\n/g;
my $numLF = @numLF;
# Translate the remaining non printables
$buf =~ s/$regex/ $prefix.$dict->{}.$suffix /eg;
# Result set translated buffer, count quirks, count lines
return ($buf, $numNUL, $numCR);
}
# Buffer example
my $buffer = "\x01\r\x02This is a test with\r\n ".
"sometimes qiurks [=12=] inside \x0C stuff [=12=] and regular \x03\r\x04";
# Translate output
my ($out, $numNUL, $numLF) = &escBuffer
($buffer, \%NONE_ASC_DICT, $NONE_ASC_REGEX);
# Result of the translation
print "ESC.TEXT: '$out' \n\n";
print "NUM.NUL: $numNUL\n";
print "NUM.LF: $numLF\n\n";
使用 table.
设置:
my %map = (
"\x00" => "<NUL>",
...,
);
my $class = join "", map quotemeta, keys(%map);
my $re = qr/([$class])/;
正在替换:
s/$re/$map{}/g
我正在尝试使用 Perl 5.28 和 Linux (Debian 8) 在 Linux 设备上测试通过 USB 连接的一些探测器。当我读出探测器的大文件缓冲区时,经常会出现 none 可读的 ASCII 符号,如 [=13=]
或 \x02
。我想将这些标志翻译成可读的标记文本。我写了一个小子程序,但在我看来,对于大型翻译列表来测试每个条目似乎有点笨拙。有更好的方法吗?
示例脚本
#!/usr/bin/env perl -w
# test-escape.pl --- test none readable chars
use strict;
sub escBuf() {
my $buf = shift;
my @numNul = $buf =~ /[=10=]/g;
my @numCR = $buf =~ /\r/g;
$buf =~ s/\r/\n/g;
$buf =~ s/\x00/<NUL>/g;
$buf =~ s/\x01/<SOH>/g;
$buf =~ s/\x02/<STX>/g;
$buf =~ s/\x03/<ETX>/g;
$buf =~ s/\x04/<EOT>/g;
$buf =~ s/\x05/<ENQ>/g;
$buf =~ s/\x06/<ACK>/g;
$buf =~ s/\x07/<BEL>/g;
$buf =~ s/\x08/<BS>/g;
$buf =~ s/\x0B/<VT>/g;
$buf =~ s/\x0C/<FF>/g;
$buf =~ s/\x0E/<SO>/g;
$buf =~ s/\x0F/<SI>/g;
my $numNUL = @numNul;
my $numCR = @numCR;
return ($buf, $numNUL, $numCR);
}
# Buffer example
my $buffer = "\x01\r\x02This is a test with\r\n ".
"sometimes qiurks [=10=] inside \x0C stuff [=10=] and regular \x03\r\x04";
# Translate output
my ($out, $numNUL, $numCR) = &escBuf($buffer);
# Not printed correctly due to [=10=]
# print "ORG.TEXT: '$buffer' \n\n";
# Result of the translation
print "ESC.TEXT: '$out' \n\n";
print "NUM.NUL: $numNUL\n";
print "NUM.CR: $numCR\n\n";
结果
/usr/bin/env perl -w "test-escape.pl"
ESC.TEXT: '<SOH>
<STX>This is a test with
sometimes qiurks <NUL> inside <FF> stuff <NUL> and regular <ETX>
<EOT>'
NUM.NUL: 2
NUM.CR: 3
编辑:采纳了
#!/usr/bin/env perl -w
# test-escape.pl --- test none readable chars
use strict;
# Dictionary of non printable signs
my %NONE_ASC_DICT = (
"\x00" => "NUL", "\x01" => "SOH", "\x02" => "STX", "\x03" => "ETX",
"\x04" => "EOT", "\x05" => "ENQ", "\x06" => "ACK", "\x07" => "BEL",
"\x08" => "BS",
# Essenital for parsing "\x09" => "TAB" "\x0a" => "LF"
"\x0b" => "VT", "\x0c" => "FF", "\x0d" => "CR",
"\x0e" => "SO", "\x0f" => "SI",
"\x10" => "DLE",
"\x11" => "DC1", "\x12" => "DC2", "\x13" => "DC3", "\x14" => "DC4",
"\x15" => "NAK", "\x16" => "SYN", "\x17" => "ETB", "\x18" => "CAN",
"\x19" => "EM", "\x1A" => "SUB", "\x1B" => "ESC", "\x1C" => "FS",
"\x1D" => "GS", "\x1E" => "RS", "\x1F" => "US", "\x7F" => "DEL",
);
# Mapping of the entries and corresponding predefined REGEX
my $NONE_ASC_CLASS = join "", map quotemeta, keys(%NONE_ASC_DICT);
my $NONE_ASC_REGEX = qr/([$NONE_ASC_CLASS])/;
# Translator subroutine
sub escBuffer() {
my ($buf, $dict, $regex, $prefix, $suffix) = @_;
# Set default sprefix suffix strings if not present
$prefix //= '<'; $suffix //= '>';
# Count the real quirks
my @numNUL = $buf =~ /[=12=]/g;
my $numNUL = @numNUL;
# Clean up mixed UNIX / DOS context
$buf =~ s/\r\n/\n/g;
$buf =~ s/\r/\n/g; # translate all remaining \r to \n
# Calc resulting number of lines
my @numLF = $buf =~ /\n/g;
my $numLF = @numLF;
# Translate the remaining non printables
$buf =~ s/$regex/ $prefix.$dict->{}.$suffix /eg;
# Result set translated buffer, count quirks, count lines
return ($buf, $numNUL, $numCR);
}
# Buffer example
my $buffer = "\x01\r\x02This is a test with\r\n ".
"sometimes qiurks [=12=] inside \x0C stuff [=12=] and regular \x03\r\x04";
# Translate output
my ($out, $numNUL, $numLF) = &escBuffer
($buffer, \%NONE_ASC_DICT, $NONE_ASC_REGEX);
# Result of the translation
print "ESC.TEXT: '$out' \n\n";
print "NUM.NUL: $numNUL\n";
print "NUM.LF: $numLF\n\n";
使用 table.
设置:
my %map = (
"\x00" => "<NUL>",
...,
);
my $class = join "", map quotemeta, keys(%map);
my $re = qr/([$class])/;
正在替换:
s/$re/$map{}/g