XPath表达式根据属性值获取节点

Question

我有以下输入 xml 文件：

<rootnode>
 <section id="1" status="fail">
  <outer status="fail">
   <inner status="fail"/>
   <inner status="pass"/>
  </outer>
  <outer status="pass">
   <inner status="pass"/>
  </outer>
  <outer status="pass"/>
  <outer status="fail"/>
 </section>
 <section id="2" status="fail">
  <outer status="fail">
   <inner status="pass"/>
   <inner status="fail"/>
   <inner status="inc"/>
  </outer>
 </section>
</rootnode>

我想过滤掉所有非失败状态的节点，这样结果是这样的：

<rootnode>
 <section id="1" status="fail">
  <outer status="fail">
   <inner status="fail"/>
  </outer>
  <outer status="fail"/>
 </section>
 <section id="2" status="fail">
  <outer status="fail">
   <inner status="fail"/>
  </outer>
 </section>
</rootnode>

<rootnode> 不一定包含在结果中。我尝试将 xmllint 与 xpath 表达式一起使用。我可以使用

提取特定节点

xmllint --xpath "//inner" input.xml
xmllint --xpath "//@status" input.xml

但它们只 return 节点而不考虑 status 的值，或者只有 return 没有周围节点的属性。

有没有办法用 xpath 表达式来做到这一点？如果没有，简单解决方案结合了其他 bash 工具也很好。

Answer 1

正如@svasa 在评论中所说，您应该使用 XSLT。您可以轻松地处理 bash 和 xsltproc, xmlstarlet (using tr command), Saxon (java on the command line) 等

中的 XSLT

这是一个使用 xsltproc 的示例：

$ xsltproc so.xsl so.xml
<?xml version="1.0"?>
<rootnode>
  <section id="1" status="fail">
    <outer status="fail">
      <inner status="fail"/>
    </outer>
    <outer status="fail"/>
  </section>
  <section id="2" status="fail">
    <outer status="fail">
      <inner status="fail"/>
    </outer>
  </section>
</rootnode>

XML 输入 (so.xml)

<rootnode>
    <section id="1" status="fail">
        <outer status="fail">
            <inner status="fail"/>
            <inner status="pass"/>
        </outer>
        <outer status="pass">
            <inner status="pass"/>
        </outer>
        <outer status="pass"/>
        <outer status="fail"/>
    </section>
    <section id="2" status="fail">
        <outer status="fail">
            <inner status="pass"/>
            <inner status="fail"/>
            <inner status="inc"/>
        </outer>
    </section>
</rootnode>

XSLT 1.0 (so.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[@status[not(normalize-space()='fail')]]"/>

</xsl:stylesheet>

I have a small follow-up question, if you don't mind. When the input.xml file does not contain any status=fail nodes, then the output is just two lines: <?xml version="1.0"?> and <rootnode/>. Is it possible two suppress the output entirely in this case? It is not really a problem, I know how to work around it in bash. I am just interested if there is a clean solution via xslt.

您可以省略 XML 声明（xsl:output 中的 omit-xml-declaration="yes"）并检查是否有任何带有 status="fail" 的元素。我会为此使用密钥 (xsl:key)...

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" omit-xml-declaration="yes">
    <!--If you need to output the declaration when there
    are elements with status="fail", it might be best to post process files that
    only contain the xml declaration.-->
  </xsl:output>
  <xsl:strip-space elements="*"/>

  <!--Key of all elements with status="fail".-->  
  <xsl:key name="fails" match="*[@status='fail']" use="@status"/>

  <xsl:template match="/*[not(key('fails','fail'))]">
    <!--If there aren't any elements with status="fail", don't process
    anything else.-->
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[@status[not(normalize-space()='fail')]]"/>

</xsl:stylesheet>

XPath表达式根据属性值获取节点

XPath expression to get node based on attribute value

xml

bash

shell

xpath

xmllint