atom feed:将多个 <author> 项合并为一个的脚本?

atom feed: script to combine multiple <author> items into one?

我想编写一个命令行脚本,将多个 <author> 标签从一个 atom feed 合并为一个。例如,条目如下:

<entry>
    <id>someid</id>
    <published>somedate</published>
    <title>Title</title>
    <summary>Summary</summary>
    <author>
      <name>Author One</name>
    </author>
    <author>
      <name>Author Two</name>
    </author>
    <author>
      <name>Author Three</name>
    </author>
  </entry>

应该变成:

<entry>
    <id>someid</id>
    <published>somedate</published>
    <title>Title</title>
    <summary>Summary</summary>
    <author>
      <name>Author One, Author Two, Author Three</name>
    </author>
  </entry>

我想我可以使用 Perl 和正则表达式自己完成,但是,由于使用正则表达式解析 XML 不是一个好主意,我将感谢使用适当 [=21= 的更优雅的解决方案]-解析器。

在 Perl 中,我建议使用 XML::LibXML

在这里,我使用 Xpath 查询来查找 name 节点,然后将所有名称推入数组,同时删除 author 节点。最后,我创建了一个附加的新 author 节点。

#!/usr/bin/perl

use strict;
use warnings;

use XML::LibXML;

# example loading the xml from a file
my $dom = XML::LibXML->load_xml(location => 'atom.xml', no_blanks => 1);
my $root = $dom->documentElement();

# the Xpath query
my $query = q{
    /entry/author/name
};

my @authornames;

foreach my $namenode ($dom->findnodes($query)) {
    # save the name
    push @authornames, $namenode->to_literal();

    # remove the author node
    $namenode->getParentNode->getParentNode->removeChild($namenode->getParentNode);

    #or:
    # $root->removeChild($namenode->getParentNode);
}

# build a new author node
my $author = XML::LibXML::Element->new('author');
$author->appendTextChild('name', join(", ",@authornames));

# and add it
$root->appendChild($author);

# print the result
print $dom->serialize(1);

#or, if you don't want the <?xml...> header:
# print $root->serialize(1) . "\n";

输出:

<?xml version="1.0"?>
<entry>
  <id>someid</id>
  <published>somedate</published>
  <title>Title</title>
  <summary>Summary</summary>
  <author>
    <name>Author One, Author Two, Author Three</name>
  </author>
</entry>

Ted 的想法是正确的,但是有些事情的完成方式比需要的更复杂,而且他们不知道 Atom 格式的属性(例如它对 的使用)。

use XML::LibXML               qw( );
use XML::LibXML::XPathContext qw( );

my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs(a => 'http://www.w3.org/2005/Atom');

# See XML::LibXML::Parser for more ways to create the document object.
my $doc = XML::LibXML->load_xml( location => 'atom.xml' );

for my $entry_node ($xpc->findnodes('/a:feed/a:entry', $doc)) {
   my @author_names;
   for my $author_node ($xpc->findnodes('a:author', $entry_node)) {
      push @author_names, $xpc->findvalue('a:name', $author_node);
      $author_node->unbindNode();
   }

   my $author_node = XML::LibXML::Element->new('author');
   my $name = $author_node->appendTextChild('name', join(", ", @author_names));
   $entry_node->appendChild($author_node);
}

$doc->toFile('atom.new.xml');