如果 xml 文档包含名称空间上下文,则 XPathFactoryImpl 无法识别根节点

XPathFactoryImpl not able to identify the root node if xml doc contains the namesapcecontext

我对 XML 和 Saxon API 很陌生,在这里我使用 Saxon 10.3 HE jar 从 XML 文件中提取数据。在这里,我想从使用日期函数的活动 country_information 节点中提取国家/地区属性。 样本输入 XML :

<person xmlns="urn:my.poctest.com">
                  <country_information>
                     <country>FRA</country>
                     <end_date>9999-12-31</end_date>
                     <start_date>2009-12-01</start_date>
                  </country_information>
                  <country_information>
                     <country>FRA</country>
                     <end_date>9999-12-31</end_date>
                     <start_date>2009-12-01</start_date>
                  </country_information>             
               </person>

代码:

import java.io.IOException;
import java.io.StringReader;
import java.util.Iterator;
import java.util.Map;

import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathFactoryConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import net.sf.saxon.xpath.XPathFactoryImpl;

public class SaxonPoc {

    public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException,
            XPathExpressionException, XPathFactoryConfigurationException {
        String xml = " <person xmlns=\"urn:my.poctest.com\">\r\n"
                + "       <country_information>\r\n"
                + "          <country>FRA</country>\r\n"
                + "          <end_date>9999-12-31</end_date>\r\n"
                + "          <start_date>2020-02-24</start_date>\r\n"
                + "       </country_information>\r\n" 
                + "       <country_information>\r\n"
                + "          <country>USA</country>\r\n"
                + "          <end_date>2020-02-23</end_date>\r\n"
                + "          <start_date>2009-12-01</start_date>\r\n"
                + "       </country_information>             \r\n" 
                + "       </person>";
        Document doc = SaxonPoc.getDocument(xml, false);
        NodeList matches = (NodeList) SaxonTest.getXpathExpression("//person", null).evaluate(doc,
                XPathConstants.NODESET);
        if (matches != null) {
            Element node = (Element) matches.item(0);
            XPath xPath1 = SaxonPoc.getXpath(null);
            String xPathStatement = "/person/country_information[xs:date(start_date) le current-date() and  xs:date(end_date) ge current-date()]/country";
            NodeList childNodes = (NodeList) xPath1.evaluate(xPathStatement, node, XPathConstants.NODESET);
            if (childNodes.getLength() > 0) {
                String nodeName = childNodes.item(0).getFirstChild().getNodeName();
                System.out.println("Node :" + nodeName);
                String value = childNodes.item(0).getTextContent();
                System.out.println("Country Name :" + value);
            }

        }
        System.out.println("Finished");

    }

    public static Document getDocument(String xml, boolean isNamespaceAware)
            throws SAXException, IOException, ParserConfigurationException {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(isNamespaceAware);
        DocumentBuilder builder = factory.newDocumentBuilder();
        InputSource is = new InputSource(new StringReader(xml));
        return builder.parse(is);
    }

    public static XPath getXpath(Map<String, String> namespaceMappings) throws XPathFactoryConfigurationException {
        XPathFactory xpathFactory = new XPathFactoryImpl();
        XPath xpath = xpathFactory.newXPath();
        NamespaceContext nsc = new NamespaceContext() {

            @Override
            public String getNamespaceURI(String prefix) {
                return (null != namespaceMappings) ? namespaceMappings.get(prefix) : null;
            }

            @Override
            public String getPrefix(String namespaceURI) {
                return null;
            }

            @Override
            public Iterator getPrefixes(String namespaceURI) {
                return null;
            }

        };
        xpath.setNamespaceContext(nsc);

        return xpath;
    }

    public static XPathExpression getXpathExpression(String xpathExpr, Map<String, String> namespaceMappings)
            throws XPathExpressionException, XPathFactoryConfigurationException {
        XPath xpath = getXpath(namespaceMappings);
        return xpath.compile(xpathExpr);
    }

}

我面临一个空指针,因为它无法找到根节点 person 和 XML 文档。如果我删除 xmlns="urn:my.poctest.com" 然后它能够​​获取根路径,但在稍后阶段,它会失败 javax.xml.xpath.XPathExpressionException: net.sf.saxon.trans.XPathException: 命名空间前缀 'xs' 尚未声明 [​​=24=]。如果我从 XML 文档中删除命名空间,并从代码中删除 NamespaceContext 实现,那么它工作正常。但实际上我不想删除这两个东西。

有人可以在这里指出我,我做错了什么吗?提前致谢!!

您可能想知道最新版本的 Saxon 包含执行

的选项
((net.sf.saxon.xpath.XPathEvaluator)XPath).getStaticContext()
    .setUnprefixedElementMatchingPolicy(
       UnprefixedElementMatchingPolicy.ANY_NAMESPACE))

这会导致 XPath 表达式中的无前缀元素名称仅与本地名称匹配,而不管命名空间如何。

这主要是为 HTML 引入的,其中完全混淆了 HTML DOM 中的元素是否在命名空间中;但它在您真正不关心名称空间并且只是希望它们不在那里让您的生活变得痛苦的情况下更有用。