我们如何用分隔符分隔从 XML::LibXMl 的 findvalues 中获取的值?

How do we separate values fetched from findvalues of XML::LibXMl by a delimiter?

我有一个 XML 需要解析。虽然我能够获取它们,但我无法通过分隔符将它们分开以进行进一步处理。请指教。我的代码如下

use XML::LibXML;

my $filename = 'Test.xml';

my $parser = XML::LibXML->new();
my $dom = $parser->parse_file($filename);
my $root = $dom->documentElement();
my $xpc = XML::LibXML::XPathContext->new($root);

foreach my $id ($xpc->findnodes('/dataset/chapter'))
{
    print $xpc->findvalue('mono/route-list', $id);
    print join ",", $xpc->findvalue('mono/route-list', $id);
}

对于两个 "print" 语句,我得到了相同的结果,尽管预期的结果是:

眼科口服局部鼻腔注射口服口服口服

眼科、口服、局部、鼻腔、注射、口服、口服、口服、口服

xml文件结构如下:

<dataset id="5"><title>NDC 11</title>
<chapter id="9"><title>NDC 11</title>
<mono id="310694" mid="145787">
<nam>00173074200</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>ophthalmic</name>
    </list-set-field>
</route-list>   
</mono>
<mono id="4128683" mid="536890">
<nam>51079020406</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="4128743" mid="536930">
<nam>65862007360</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>topical</name>
    </list-set-field></route-list>
</mono>
<mono id="3419599" mid="469070">
<nam>49702021718</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>nasal</name>
    </list-set-field>
</route-list>
</mono>
<mono id="2990346" mid="440470">
<nam>49702022118</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>injection</name>
    </list-set-field>
</route-list>
</mono>
<mono id="2990347" mid="440470">
<nam>49702022144</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="2990357" mid="440491">
<nam>49702022248</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="3808911" mid="513570">
<nam>00378410591</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="4128724" mid="536910">
<nam>60505358306</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
</chapter>
</dataset>

如果您尝试此代码(注意 for 循环中的最后一行):

use strict;
use warnings;
use 5.016;
use XML::LibXML;

my $filename = 'Test.xml';

my $dom = XML::LibXML->load_xml(
    location => $filename,
);

my $xpc = XML::LibXML::XPathContext->new($dom);

CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
    my $string = $xpc->findvalue('mono/route-list', $chapter);
    print $string;

    last CHAPTER;  #<*****NOTE THIS
}

您将得到输出:

          ophthalmic



      oral



          topical



       nasal



       injection



       oral



       oral



       oral



       oral

文档说:

findvalue()

...returns the literal value of the results.

results 多于 one 结果。而one结果是alltext之间的一个匹配标签。

xml每行末尾有一个隐藏字符:

  <route-list>\n
    <list-set-field dbId="25413">\n
        <name>ophthalmic</name>\n
    </list-set-field>\n
  </route-list>\n  

...以及每行开头的几个 spaces/tabs。 spaces/tabs 和换行符被视为文本,它们位于 <route_list> 标记之间。结果,one 结果的文本也包含所有 spaces/tabs/换行符。

和 findvalue() returns 将所有结果中的文本作为一个字符串。您 可以 使用正则表达式拆分该字符串以获得各个值;但与其为自己创造更多工作,不如这样做:

CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
    for my $name ($xpc->findnodes('//mono/route-list//name', $chapter)) {
        say $name->textContent;
        last CHAPTER;
    }
}

--output:--
ophthalmic

...甚至这个:

CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
    for my $name_text ($xpc->findnodes('//mono/route-list//name/text()', $chapter)) {
        say $name_text;
        last CHAPTER;
    }
}