按子内容提取周围 XML 标签

Question

我有一个 XML 文件，基本上是这样的：

<products xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attribute Name="Identifier">NumberOne</Attribute>
  </Product>
  <Product Id="2">
      <Attribute Name="Identifier">NumberTwo</Attribute>
  </Product>
</products>

我想做的是提取完整的产品。搜索产品节点

<Attribute Name="Identifier">SEARCH_TEXT</Attribute>

例如，对于 NumberOne，我会获取周围的 Product (Id="1") 标签及其内容。

示例：对于搜索文本“NumberOne”，所需的结果是：

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attribute Name="Identifier">NumberOne</Attribute>
  </Product>

对于搜索文本“NumberTwo”，它将是

<Product Id="2">
      <Attribute Name="Identifier">NumberTwo</Attribute>
  </Product>

我试过的是这个正则表达式 (Python):

<Product ((?!</Product>)[\S|\s])*<Attribute Name=\"Identifier\">NumberOne</Attribute>((?!</Product>)[\S|\s])*</Product>

但是由于嵌套的产品，这确实有效。有人有解决这个问题的提示吗？

我读到正则表达式并不是解决这类 XML 搜索问题的最明智的方法。实际上，顶级产品的重量更复杂，我需要合并两个看起来像我的示例的 XML 文件。所以我希望通过使用正则表达式我可以在“字符串”级别而不是 XML 解析器级别解决这个问题，在解析器级别我可能需要在生成最终 XML 输出之前准备那些复杂的对象。只需通过该标识符值找到顶级产品，并完全抓住它们 - 无论它们包含什么。

非常感谢。

更新：基于 Jack Fleeting 的解决方案——这就是我最终使用的 (XPath):

//products//Product[Attribute[@Name="Identifier" and text()="NumberOne"]]

Answer 1

尝试用正则表达式解析xml确实不是一个好主意。假设我理解正确，使用 xpath 应该可以让你到达那里。例如，

//Product[.//*[.="NumberOne"]]

应该输出：

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"/>
      </Product>
      <Attribute Name="Identifier">NumberOne</Attribute>
  </Product>

等等

按子内容提取周围 XML 标签

Extract surrounding XML Tags by child content

xpath

xml-parsing