从命名空间 xml 中提取一个节点及其全部内容

Extract a node with its entire content from a namespaced xml

给定以下命名空间 xml 文件:

<ptk:PrintTalk xmlns:ptk="http://linkToNameSpace"> xmlns:xjdf="http://linkToNamespace"
 <ptk:Request>
  <ptk:PurchaseOrder Currency="EUR">
   <xjdf:XJDF name="someName" version="2.0">
     <xjdf:ProductList>
      <xjdf:Product>
       ...
      </xjdf:Product>
      <xjdf:OtherProduct>
       ...
      </xjdf:OtherProduct> 
      and many other products
     </xjdf:ProductList>
     <xjdf:ParameterSet>
      <xjdf:Parameter>
       ...
      </xjdf:Parameter> and so on until
   </xjdf:XJDF>
  </ptk:PurchaseOrder>
 </ptk:Request>
</ptk:PrintTalk>

如何使用 XPath 提取以下内容:

<xjdf:XJDF name="someName" version="2.0">
 <xjdf:ProductList>
  <xjdf:Product>
   ...
  </xjdf:Product>
  <xjdf:OtherProduct>
   ...
  </xjdf:OtherProduct> 
   and many other products
  </xjdf:ProductList>
   <xjdf:ParameterSet>
    <xjdf:Parameter>
     ...
    </xjdf:Parameter> and so on until
</xjdf:XJDF>

我已经尝试过类似的方法:

/ptk:PrintTalk/ptk:Request/ptk:PurchaseOrder/* 

//xjdf:XJDF

但是这些表达式没有给我想要的结果。我使用 IntellijIdea 内置的 xpath 表达式计算器,编程语言是 java。没有 xpath 的库 - 只是 java.xml.*

更新

使用

//ptk:PurchaseOrder//*

我将每个节点作为单个节点获取,内部没有任何子节点,e。 G。会

<xjdf:ProductList>
 <xjdf:Product>
  ...
 </xjdf:Product>
</xjdf:ProductList> (here the product tag is a child of product list tag)

结果

<xjdf:ProuctList>
<xjdf:Product>

我用来做操作的java代码:

@Override
public XJDF readFrom(
    final Class<XJDF> type, final Type genericType, final Annotation[] annotations, final MediaType mediaType,
    final MultivaluedMap<String, String> multivaluedMap, final InputStream inputStream
) throws IOException {
    try {
        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        Document documentPtk = documentBuilder.parse(new InputSource(inputStream));
        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xPath = xPathFactory.newXPath();
        XPathExpression xPathExpression = xPath.compile("//ptk:PurchaseOrder//*");
        Document documentXjdf = (Document) xPathExpression.evaluate(documentPtk, XPathConstants.NODE);
    } catch (Exception e) {
        throw new WebApplicationException("PrintTalk document could not be deserialized.", e);
    }
}

这里要说明三个要点:

  • DocumentBuilderFactory 默认情况下不支持命名空间,您必须在创建 DocumentBuilder
  • 之前显式打开命名空间
  • XPath 不使用 XML 文档中的命名空间前缀映射,它使用自己的 NamespaceContext 而不是
  • 此查询返回的 Node 不会是 Document,而是 Element

烦人的是 Java 核心 class 库中没有 NamespaceContext 的默认实现,因此您必须使用第三方库(我通常使用 SimpleNamespaceContext from Spring ) 或编写您自己的接口实现。

这是一个使用 SimpleNamespaceContext 的例子:

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document documentPtk = documentBuilder.parse(new InputSource(inputStream));
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();

SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
nsCtx.bindNamespaceUri("p", "http://linkToNameSpace");
xPath.setNamespaceContext(nsCtx);

XPathExpression xPathExpression = xPath.compile("/p:PrintTalk/p:Request/p:PurchaseOrder/*");
Element documentXjdf = (Element) xPathExpression.evaluate(documentPtk, XPathConstants.NODE);