Bullet point实体在Perl程序必须捕获的数据中显示为(框类型未知实体)

Bullet point entity is shown as  (box type unknown entity) in the data which have to be captured by Perl program

我有一个 xml 文件,其中项目符号点的数据以方框实体的形式显示  我无法使用 Perl 捕获它 program.Can 有人帮我解决这个问题! !

部分输入数据:

<p> Adding Basic Requirements: AU sec. 334 suggests procedures for the auditor's consideration, noting that not all of them may be required in every audit.</p>

预期输出:

<p>Adding Basic Requirements: AU sec. 334 suggests procedures for the auditor's consideration, noting that not all of them may be required in every audit.</p>

Perl 程序:

use strict;
use warnings;
use utf8;
my $filename = $ARGV[0];
my $ext = $ARGV[1];
my $inputfile = $filename . "\." . $ext;
my $document = do {
         local $/ = undef;
         open my $fh,'<',$inputfile or die "Couldn't open the file $inputfile:$!";
       <$fh>;
      };

open my $out,">$filename.sgm" or die "Couldn\'t write to the file $filename.sgm:$!"; 

$document =~ s/?/<i>/isg;

print $out $document;

输出:

程序无法捕获该框类型实体并且没有结果。输出无变化

我的浏览器显示一个框,里面有F0B7,表示字符是U+F0B7,一个private use character

如果您的 Perl 代码文件使用 UTF-8 编码并且包含 use utf8;,您可以简单地使用该字符作为正则表达式

s/\s*//g

但改为命名会更具可读性。

s/\x{F0B7}\s*//g

s/\N{U+F0B7}\s*//g

在所有情况下,输入文件都需要正确解码。

use open ':std', ':encoding(UTF-8)';