将编码标签保留在 XML::Twig 中

Keep encoded tag in XML::Twig

我想使用 XML::Twig.

修改大型 XML 文件

使用处理程序回调时,XML::Twig 似乎会更改编码为 HTML 实体的字符,例如大于号 (> -- >)。

示例脚本:

my $input = q~
<root>
    <p>&lt;encoded tag&gt;</p>
</root>
~;

my $t = XML::Twig->new(
    keep_spaces              => 1,
    twig_roots               => { 'p' => \&convert, },   # process p tags
    twig_print_outside_roots => 1,                       # print the rest
);

$t->parse($input);


sub convert {
    my ($t, $p)= @_;

    $p->set_att('x' => 'y');

    $p->print;
}

这会将文档变成以下内容:

<root>
    <p x="y">&lt;encoded tag></p>
</root>

我期待得到这个:

<root>
    <p x="y">&lt;encoded tag&gt;</p>
</root>

如何使用 XML::Twig 保留标签的编码内容?

您需要在构造函数中设置keep_encoding选项,如下所示,或者在构造对象后调用$twig->set_keep_encoding($option)修改它

请注意 the module documentation 是这样说的

This is a (slightly?) evil option: if the XML document is not UTF-8 encoded and you want to keep it that way, then setting keep_encoding will use the "Expat" original_string method for character, thus keeping the original encoding, as well as the original entities in the strings.

但是就在这里,按照你的要求去做。风险自负

use strict;
use warnings 'all';

use XML::Twig;

my $input = <<END_XML;
<root>
    <p>&lt;encoded tag&gt;</p>
</root>
END_XML

my $t = XML::Twig->new(
    keep_spaces              => 1,
    keep_encoding            => 1,
    twig_roots               => { p => \&convert },   # process p elements
    twig_print_outside_roots => 1,                    # print the rest
);

$t->parse($input);


sub convert {
    my ($t, $p) = @_;
    $p->print;
}

输出

<root>
    <p>&lt;encoded tag&gt;</p>
</root>