仅以有效的方式读取少数 xml 个元素

Question

我只想读取几个 XML 标签值。我在下面写了 code.XML 大而且有点复杂。但是例如我简化了 xml 。有没有其他有效的方法来解决它？我正在使用 JAVA 8

DocumentBuilderFactory dbfaFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder documentBuilder = dbfaFactory.newDocumentBuilder();
        Document doc = documentBuilder.parse("xml_val.xml");
        
        
        System.out.println(doc.getElementsByTagName("date_added").item(0).getTextContent());



<item_list id="item_list01">
   <numitems_intial>5</numitems_intial>
   <item>
     <date_added>1/1/2014</date_added>
     <added_by person="person01" />
   </item>
   <item>
      <date_added>1/6/2014</date_added>
      <added_by person="person05" />
    </item>
    <numitems_current>7</numitems_current>
    <manager person="person48" />
</item_list>

Answer 1

使用 XPAth 并传递特定的表达式来获取所需的元素

public class MainJaxbXpath {

    public static void main(String[] args) {
        try {
            FileInputStream fileIS;
            fileIS = new FileInputStream("/home/luis/tmp/test.xml");

            DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder;
            builder = builderFactory.newDocumentBuilder();

            Document xmlDocument;
            xmlDocument = builder.parse(fileIS);

            XPath xPath = XPathFactory.newInstance().newXPath();
            String expression = "//item_list[@id=\"item_list01\"]//date_added[1]";
            String nodeList =(String) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING);
            System.out.println(nodeList);
        } catch (SAXException | IOException | ParserConfigurationException | XPathExpressionException e3) {
            e3.printStackTrace();
        }

    }

}

结果：

1/1/2014

在同一操作中查找多个元素

        String expression01 = "//item_list[@id=\"item_list01\"]//date_added[1]";
        String expression02 = "//item_list[@id=\"item_list02\"]//date_added[2]";
        String expression = String.format("%s | %s", expression01, expression02);
        NodeList nodeList =(NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
        for (int i = 0; i < nodeList.getLength(); i++) {
            Node currentNode = nodeList.item(i);
            if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
                System.out.println(currentNode.getTextContent());
            }
        }

Answer 2

一些建议。

首先，不要使用DOM。在 Java 中有广泛的 dom 类 XML 树表示； DOM是第一个也是最差的。后来像JDOM2和XOM这样的第三方模型设计得更好。

其次，考虑使用面向 XML 的语言（如 XSLT 或 XQuery）而不是 Java 来完成整个工作。在 XQuery 中，使用 Saxon 的 XQuery API，这将是：

Processor proc = new Processor(false);
XQueryCompiler comp = proc.newXQueryCompiler();
XQueryExecutable exec = comp.compile("//date_added");
XQueryEvaluator eval = exec.load();
eval.setSource(new StreamSource(new File("/home/luis/tmp/test.xml")));
for (XdmItem item : eval.evaluate()) {
  System.out.println(item.getStringValue());
}

但是由于查询如此简单，Saxon 也有直接的 map/reduce 风格 API 来访问树。这将是：

Processor proc = new Processor(false);
XdmNode doc = proc.newDocumentBuilder().build(
  new StreamSource(new File("/home/luis/tmp/test.xml")));
for (XdmItem item : doc.select(descendant("date_added")).asList()) {
  System.out.println(item.getStringValue());
}

一个与效率无关的建议：请使用国际标准日期。 1/6/2014 可能是 6 月 1 日或 1 月 6 日。将其写为 2014-06-01（或者 2014-01-06，如果这是你想要的）不仅可以避免使用模棱两可的格式时出现的那种危险错误，还意味着你可以使用标准的日期和时间处理库，例如 XPath 2.0+ 函数库。

仅以有效的方式读取少数 xml 个元素

Read few xml elements only in an efficient way

java

xml

xml-parsing