使用 xmllint 从 XML 文档中提取多个节点

Question

我正在尝试使用 xmllint 来提取名为 //item 的多个 parent 节点下的多个节点，如下所示：

<item>
        <title>A title</title>
        <link>http://www.example.com</link>
        <pubDate>Mon, 08 Aug 2016 09:04:11 +0000</pubDate>
        <dc:creator><a name></dc:creator>
        <location><a name></dc:creator>
</item>

如果我只是想提取节点（例如标题），我通常会这样做：

xmllint --shell myXml.xml

然后是cat //item/title，这只会检索所有标题标签及其值。我可以使用 xmllint 获取节点的子集吗，例如：

        <title>A title</title>
        <link>http://www.example.com</link>
        <pubDate>Mon, 08 Aug 2016 09:04:11 +0000</pubDate>

谢谢，

Answer 1

您可以使用 or 作为元素名称：

cat //item/*[name()="title" or name()="link" or name()="pubDate"]

我不得不修改你的 XML 以使其格式正确。

或者，使用更高级的工具，例如 xsh（我碰巧在维护）：

for //item ls ( title | link | pubDate ) ;

Answer 2

这是使用 self 轴和联合 (|) 运算符获取多个不同名称的元素的替代 XPath：

cat //item/*[self::title|self::link|self::pubDate]
 -------
<title>A title</title>
 -------
<link>http://www.example.com</link>
 -------
<pubDate>Mon, 08 Aug 2016 09:04:11 +0000</pubDate>

使用 xmllint 从 XML 文档中提取多个节点

Extracting multiple nodes from an XML document with xmllint

xml

xmllint