Bullet point实体在Perl程序必须捕获的数据中显示为(框类型未知实体)
Bullet point entity is shown as (box type unknown entity) in the data which have to be captured by Perl program
我有一个 xml 文件,其中项目符号点的数据以方框实体的形式显示 我无法使用 Perl 捕获它 program.Can 有人帮我解决这个问题! !
部分输入数据:
<p> Adding Basic Requirements: AU sec. 334 suggests procedures for the auditor's consideration, noting that not all of them may be required in every audit.</p>
预期输出:
<p>Adding Basic Requirements: AU sec. 334 suggests procedures for the auditor's consideration, noting that not all of them may be required in every audit.</p>
Perl 程序:
use strict;
use warnings;
use utf8;
my $filename = $ARGV[0];
my $ext = $ARGV[1];
my $inputfile = $filename . "\." . $ext;
my $document = do {
local $/ = undef;
open my $fh,'<',$inputfile or die "Couldn't open the file $inputfile:$!";
<$fh>;
};
open my $out,">$filename.sgm" or die "Couldn\'t write to the file $filename.sgm:$!";
$document =~ s/?/<i>/isg;
print $out $document;
输出:
程序无法捕获该框类型实体并且没有结果。输出无变化
我的浏览器显示一个框,里面有F0B7,表示字符是U+F0B7,一个private use character。
如果您的 Perl 代码文件使用 UTF-8 编码并且包含 use utf8;
,您可以简单地使用该字符作为正则表达式
s/\s*//g
但改为命名会更具可读性。
s/\x{F0B7}\s*//g
s/\N{U+F0B7}\s*//g
在所有情况下,输入文件都需要正确解码。
use open ':std', ':encoding(UTF-8)';
我有一个 xml 文件,其中项目符号点的数据以方框实体的形式显示 我无法使用 Perl 捕获它 program.Can 有人帮我解决这个问题! !
部分输入数据:
<p> Adding Basic Requirements: AU sec. 334 suggests procedures for the auditor's consideration, noting that not all of them may be required in every audit.</p>
预期输出:
<p>Adding Basic Requirements: AU sec. 334 suggests procedures for the auditor's consideration, noting that not all of them may be required in every audit.</p>
Perl 程序:
use strict;
use warnings;
use utf8;
my $filename = $ARGV[0];
my $ext = $ARGV[1];
my $inputfile = $filename . "\." . $ext;
my $document = do {
local $/ = undef;
open my $fh,'<',$inputfile or die "Couldn't open the file $inputfile:$!";
<$fh>;
};
open my $out,">$filename.sgm" or die "Couldn\'t write to the file $filename.sgm:$!";
$document =~ s/?/<i>/isg;
print $out $document;
输出:
程序无法捕获该框类型实体并且没有结果。输出无变化
我的浏览器显示一个框,里面有F0B7,表示字符是U+F0B7,一个private use character。
如果您的 Perl 代码文件使用 UTF-8 编码并且包含 use utf8;
,您可以简单地使用该字符作为正则表达式
s/\s*//g
但改为命名会更具可读性。
s/\x{F0B7}\s*//g
s/\N{U+F0B7}\s*//g
在所有情况下,输入文件都需要正确解码。
use open ':std', ':encoding(UTF-8)';