atom feed:将多个 <author> 项合并为一个的脚本?
atom feed: script to combine multiple <author> items into one?
我想编写一个命令行脚本,将多个 <author>
标签从一个 atom feed 合并为一个。例如,条目如下:
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One</name>
</author>
<author>
<name>Author Two</name>
</author>
<author>
<name>Author Three</name>
</author>
</entry>
应该变成:
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One, Author Two, Author Three</name>
</author>
</entry>
我想我可以使用 Perl 和正则表达式自己完成,但是,由于使用正则表达式解析 XML 不是一个好主意,我将感谢使用适当 [=21= 的更优雅的解决方案]-解析器。
在 Perl 中,我建议使用 XML::LibXML
。
在这里,我使用 Xpath
查询来查找 name
节点,然后将所有名称推入数组,同时删除 author
节点。最后,我创建了一个附加的新 author
节点。
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
# example loading the xml from a file
my $dom = XML::LibXML->load_xml(location => 'atom.xml', no_blanks => 1);
my $root = $dom->documentElement();
# the Xpath query
my $query = q{
/entry/author/name
};
my @authornames;
foreach my $namenode ($dom->findnodes($query)) {
# save the name
push @authornames, $namenode->to_literal();
# remove the author node
$namenode->getParentNode->getParentNode->removeChild($namenode->getParentNode);
#or:
# $root->removeChild($namenode->getParentNode);
}
# build a new author node
my $author = XML::LibXML::Element->new('author');
$author->appendTextChild('name', join(", ",@authornames));
# and add it
$root->appendChild($author);
# print the result
print $dom->serialize(1);
#or, if you don't want the <?xml...> header:
# print $root->serialize(1) . "\n";
输出:
<?xml version="1.0"?>
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One, Author Two, Author Three</name>
</author>
</entry>
Ted 的想法是正确的,但是有些事情的完成方式比需要的更复杂,而且他们不知道 Atom 格式的属性(例如它对 的使用)。
use XML::LibXML qw( );
use XML::LibXML::XPathContext qw( );
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs(a => 'http://www.w3.org/2005/Atom');
# See XML::LibXML::Parser for more ways to create the document object.
my $doc = XML::LibXML->load_xml( location => 'atom.xml' );
for my $entry_node ($xpc->findnodes('/a:feed/a:entry', $doc)) {
my @author_names;
for my $author_node ($xpc->findnodes('a:author', $entry_node)) {
push @author_names, $xpc->findvalue('a:name', $author_node);
$author_node->unbindNode();
}
my $author_node = XML::LibXML::Element->new('author');
my $name = $author_node->appendTextChild('name', join(", ", @author_names));
$entry_node->appendChild($author_node);
}
$doc->toFile('atom.new.xml');
我想编写一个命令行脚本,将多个 <author>
标签从一个 atom feed 合并为一个。例如,条目如下:
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One</name>
</author>
<author>
<name>Author Two</name>
</author>
<author>
<name>Author Three</name>
</author>
</entry>
应该变成:
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One, Author Two, Author Three</name>
</author>
</entry>
我想我可以使用 Perl 和正则表达式自己完成,但是,由于使用正则表达式解析 XML 不是一个好主意,我将感谢使用适当 [=21= 的更优雅的解决方案]-解析器。
在 Perl 中,我建议使用 XML::LibXML
。
在这里,我使用 Xpath
查询来查找 name
节点,然后将所有名称推入数组,同时删除 author
节点。最后,我创建了一个附加的新 author
节点。
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
# example loading the xml from a file
my $dom = XML::LibXML->load_xml(location => 'atom.xml', no_blanks => 1);
my $root = $dom->documentElement();
# the Xpath query
my $query = q{
/entry/author/name
};
my @authornames;
foreach my $namenode ($dom->findnodes($query)) {
# save the name
push @authornames, $namenode->to_literal();
# remove the author node
$namenode->getParentNode->getParentNode->removeChild($namenode->getParentNode);
#or:
# $root->removeChild($namenode->getParentNode);
}
# build a new author node
my $author = XML::LibXML::Element->new('author');
$author->appendTextChild('name', join(", ",@authornames));
# and add it
$root->appendChild($author);
# print the result
print $dom->serialize(1);
#or, if you don't want the <?xml...> header:
# print $root->serialize(1) . "\n";
输出:
<?xml version="1.0"?>
<entry>
<id>someid</id>
<published>somedate</published>
<title>Title</title>
<summary>Summary</summary>
<author>
<name>Author One, Author Two, Author Three</name>
</author>
</entry>
Ted 的想法是正确的,但是有些事情的完成方式比需要的更复杂,而且他们不知道 Atom 格式的属性(例如它对
use XML::LibXML qw( );
use XML::LibXML::XPathContext qw( );
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs(a => 'http://www.w3.org/2005/Atom');
# See XML::LibXML::Parser for more ways to create the document object.
my $doc = XML::LibXML->load_xml( location => 'atom.xml' );
for my $entry_node ($xpc->findnodes('/a:feed/a:entry', $doc)) {
my @author_names;
for my $author_node ($xpc->findnodes('a:author', $entry_node)) {
push @author_names, $xpc->findvalue('a:name', $author_node);
$author_node->unbindNode();
}
my $author_node = XML::LibXML::Element->new('author');
my $name = $author_node->appendTextChild('name', join(", ", @author_names));
$entry_node->appendChild($author_node);
}
$doc->toFile('atom.new.xml');