如何解析具有相同标签名称的嵌套 xml 标签

How to parse nested xml tags with the same tag name

我有一个未指定数量的嵌套类别,其中包含项目:

<categories>
    <category>abc
        <category>cde
            <item>someid</item>
            <item>someid</item>
            <item>someid</item>
            <item>someid</item>
        </category>
    </category>
<category>xyz
   <category>zwd
       <category>hgw
           <item>someid</item>
...

结果应该是嵌套最深的类别(cde 或 hgw)中的项目列表。棘手的是,可以有两层以上的类别嵌套,我想为子类别保存每个父类别。

我已经用 Jackson XmlMapper 和 ObjectMapper 做了一些 xml 解析,但这个用例似乎遥不可及。所以我用 javax xml 解析器尝试了它但是放弃了,因为代码看起来很糟糕而且很难读。

知道如何以更优雅的方式解决这个问题吗?

如果任务是快速从 xml 中提取一些值,那么我会使用 jsoup。 Jsoup is actually an html parser, but is also able to parse xml. I'm not sure if jsoup can also validate xml schema and handle namespaces and and ... which is possible with other parsers. But to read a few values jsoup is usually enough for me. If you want to take a look at the Jsoup cookbook or the selector syntax

专家:

<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

使用 jsoup,您的代码可能类似于:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;

public class Example {


    public static void main(String[] args) {
        String xml = "<categories>\n"
                + "    <category>abc\n"
                + "        <category>cde\n"
                + "            <item>someid_1</item>\n"
                + "            <item>someid_2</item>\n"
                + "            <item>someid_3</item>\n"
                + "            <item>someid_4</item>\n"
                + "        </category>\n"
                + "    </category>\n"
                + "    <category>xyz\n"
                + "       <category>zwd\n"
                + "          <category>hgw\n"
                + "             <item>someid_5</item>\n"
                + "          </category>\n"
                + "       </category>\n"
                + "    </category>\n"
                + " </categories>";

        Document doc = Jsoup.parse(xml, "", Parser.xmlParser());

        //if you are interested in Items only
        Elements items = doc.select("category > item");
        items.forEach(i -> {
            System.out.println("Parent text: " +i.parent().ownText());
            System.out.println("Item text: "+ i.text());
            System.out.println();
        });


        //if you are interested in categories having at least one direct item element
        Elements categories = doc.select("category:has(> item)");
        categories.forEach(c -> {
            System.out.println(c.ownText());
            Elements children = c.children();
            children.forEach(ch -> {
                System.out.println(ch.text());
            });
            System.out.println();
        });
    }

}

输出:

Parent text: cde
Item text: someid_1

Parent text: cde
Item text: someid_2

Parent text: cde
Item text: someid_3

Parent text: cde
Item text: someid_4

Parent text: hgw
Item text: someid_5

cde
someid_1
someid_2
someid_3
someid_4

hgw
someid_5