Perl XML::LibXML 获取标签外的数据

Perl XML::LibXML Get data outside of a tag

作为我上一个 (Perl XML::LibXML Getting info from specific nodes) 的后续问题

鉴于以下 XML 数据,我无法弄清楚如何获取 <tab/> 标记后显示的数据(没有结束标记,而无需从该部分中的子节点?请参阅下文了解更多详情:

XML 样本:

<title number="3">
<catchline>Uniform Agricultural Cooperative Association Act</catchline>
<chapter number="3-1">
<catchline>
General Provisions Relating to Agricultural Cooperative Associations
</catchline>
<section number="3-1-1">
<histories>
<history>
Amended by Chapter
<modchap sess="2010GS">378</modchap>
, 2010 General Session
</history>
<modyear>2010</modyear>
</histories>
<catchline>Declaration of policy.</catchline>
<tab/>
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed. THIS IS THE DATA THAT I WANT TO GET
</section>
<section number="3-1-1.1">
<histories>
<history>
Amended by Chapter
<modchap sess="1996GS">79</modchap>
, 1996 General Session
</history>
<modyear>1996</modyear>
</histories>
<catchline>General corporation laws do not apply.</catchline>
<tab/>
<xref depth="1" refnumber="16-10a" start="0">
Title 16, Chapter 10a, Utah Revised Business Corporation Act
</xref>
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
<xref depth="3" refnumber="3-1-13.4" start="0">3-1-13.4</xref>
,
<xref depth="3" refnumber="3-1-13.7" start="0">3-1-13.7</xref>
, and
<xref depth="3" refnumber="3-1-16.1" start="0">3-1-16.1</xref>
.
</section>
</chapter>
</title>

这是我当前的 perl 脚本:

!/usr/bin/perl -w


use XML::LibXML;


my $dom = XML::LibXML->load_xml(location => "file.xml");
my $titleName = $dom->findvalue('/title/catchline');
print "Title $titleName\n";

my @chapters = $dom->findnodes('/title/chapter');

for $chapter (@chapters) {
        my $chapterNo = $chapter->getAttribute('number');
        my $chapterName = $chapter->findvalue('catchline');
        print " Chapter #$chapterNo - $chapterName\n";

        my @sections = $chapter->findnodes('section');

        for $section (@sections) {
                my $sectionNo = $section->getAttribute('number');
                my $sectionName = $section->findvalue('catchline');
                my $sectionData = $section->textContent;
                print "  Section #$sectionNo - $sectionName\nSECDATA: $sectionData\n\n";

        }
}

这行得通,但发生的事情可能正是我所要求的,它打印了 <section> 中 $sectionData 变量的所有内容。

我想要做的只是从 <tab/> 标签之后获取数据,标签内没有任何其他内容。像 <histories><history><xref> 等的子标签..

例如,字符串:

, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections

不包含在任何特定标签中,我如何才能获得该数据?

当前输出为:

Title Uniform Agricultural Cooperative Association Act
 Chapter #3-1 - 
General Provisions Relating to Agricultural Cooperative Associations

  Section #3-1-1 - Declaration of policy.
SECDATA: 


Amended by Chapter
378
, 2010 General Session

2010

Declaration of policy.

It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.


  Section #3-1-1.1 - General corporation laws do not apply.
SECDATA: 


Amended by Chapter
79
, 1996 General Session

1996

General corporation laws do not apply.


Title 16, Chapter 10a, Utah Revised Business Corporation Act

, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
3-1-13.4
,
3-1-13.7
, and
3-1-16.1
.

但我正在寻找的更像是:

Title Uniform Agricultural Cooperative Association Act
 Chapter #3-1 - 
General Provisions Relating to Agricultural Cooperative Associations

  Section #3-1-1 - Declaration of policy.
SECDATA: 
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.


  Section #3-1-1.1 - General corporation laws do not apply.
SECDATA: 
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections

如果您想要 tab 元素之后的所有节点(即元素节点和文本节点),您可以使用以下内容:

my @post_tab_nodes = $section_node->findnodes('tab/following-sibling::node()');

将生成的节点呈现为文本是留给用户的练习。您可以使用 $node->nodeType 区分元素节点和文本节点。它 returns XML_ELEMENT_NODEXML_TEXT_NODE(由 XML::LibXML 导出)分别用于这些节点类型。