将 XML 解析为 Java 对象

Parsing XML into Java object

我正在尝试确定将从 Web 服务调用获得的 XML 响应解析为 Java 对象的最佳方法。 使用 JAXB 似乎是最简单的方法,但是我为此得到的每个示例都要求您有一个模板 Java class ,它将 Java 类型转换为 XML . 我的xml如下

  <?xml version="1.0" encoding="utf-8" ?>
  <entry_list version="1.0">
      <entry id="main[1]"> <hw highlight="yes" hindex="1">main</hw> <sound><wav>main0001.wav</wav></sound> <pr>ˈmeɪn</pr> <fl>adjective</fl> <lb>always used before a noun</lb> <def><dt>:most important :<sx>chief</sx> <sx>principal</sx> <vi>the <it>main</it> idea/point</vi> <vi>the <it>main</it> goal/purpose</vi> <vi>Speed is the <it>main</it> advantage of this approach.</vi> <vi>The company's <it>main</it> office is located in New York.</vi> <vi>the novel's <it>main</it> character</vi> <vi>driving down the <it>main</it> road/highway</vi> <vi>the <it>main</it> gate/entrance</vi> <vi>This dish can be served as a <phrase>main course</phrase> or appetizer.</vi> <vi>And now for the <phrase>main event</phrase> of the evening!</vi></dt></def> <uro><ure>main*ly</ure> <fl>adverb</fl> <utxt><vi>The reviews have been <it>mainly</it> [=<it>mostly</it>] positive.</vi> <vi>a plant found <it>mainly</it> [=<it>chiefly</it>] in coastal regions</vi> <vi>I don't like the plan, <it>mainly</it> because I think it's too expensive.</vi> <vi>The problems have been <it>mainly</it> minor ones. [=most of the problems have been minor ones]</vi> <vi>They depend <it>mainly</it> on/upon fish for food.</vi></utxt></uro></entry>
      <entry id="main[2]"> <hw hindex="2">main</hw> <altpr>ˈmeɪn</altpr> <fl>noun</fl> <in><il>plural</il> <if>mains</if></in> <def><sn>1</sn> <sgram>count</sgram> <dt>:the largest pipe in a system of connected pipes <vi>a gas <it>main</it></vi> <vi>a water <it>main</it></vi></dt> <sn>2</sn> <bnote>the mains</bnote> <ssl>Brit</ssl> <sn>a</sn> <dt>:the system of pipes or wires for electricity, gas, or water <vi>My radio runs either off batteries or off <it>the mains</it>.</vi> <un>often used as <it>mains</it> before another noun <vi>We haven't had any <it>mains</it> water/electricity since the storm.</vi></un></dt> <sn>b</sn> <dt>:the place where electricity, gas, or water enters a building or room <vi>Turn off the water at <it>the mains</it>.</vi></dt></def> <dro><dre>in the main</dre> <def><dt>:in general <un>used to say that a statement is true in most cases or at most times <vi>The workers are <it>in the main</it> very capable. [=most of the workers are very capable]</vi> <vi>The weather has <it>in the main</it> been quite good. [=has been quite good most of the time]</vi></un></dt></def></dro></entry>
      <entry id="main clause"> <hw>main clause</hw> <fl>noun</fl> <in><il>plural</il> <if>⁓ clauses</if></in> <def><gram>count</gram> <sl>grammar</sl> <dt>:a clause that could be used by itself as a simple sentence but that is part of a larger sentence <ca>called also <cat>independent clause</cat></ca> <dx>compare <dxt>coordinate clause</dxt> <dxt>subordinate clause</dxt></dx></dt></def></entry>
      <entry id="main drag"> <hw>main drag</hw> <fl>noun</fl> <in><il>plural</il> <if>⁓ drags</if></in> <def><gram>count</gram> <sl>US</sl> <sl>informal</sl> <dt>:the main street in a town or city <vi>A carload of teenagers were cruising down the <it>main drag</it>.</vi></dt></def></entry>
      <entry id="main line"> <hw>main line</hw> <fl>noun</fl> <in><il>plural</il> <if>⁓ lines</if></in> <def><gram>count</gram> <dt>:an important highway or railroad line</dt></def></entry>
      <entry id="main man"> <hw>main man</hw> <fl>noun</fl> <in><il>plural</il> <if>⁓ men</if></in> <def><gram>count</gram> <sl>US</sl> <sl>informal</sl> <sn>1</sn> <dt>:someone's best male friend <vi>He's still her <it>main man</it>.</vi></dt> <sn>2</sn> <dt>:the most important or admired man in a group <vi>The team has many good players, but he is clearly the <it>main man</it>.</vi></dt></def></entry>
      <entry id="main squeeze"> <hw>main squeeze</hw> <fl>noun</fl> <in><il>plural</il> <if>⁓ squeezes</if></in> <def><gram>count</gram> <sl>chiefly US slang</sl> <dt>:someone's main girlfriend, boyfriend, or lover <vi>She's my <it>main squeeze</it>.</vi></dt></def></entry>
      <entry id="main street"> <hw>main street</hw> <fl>noun</fl> <in><il>plural</il> <if>⁓ streets</if></in> <def><sn>1</sn> <sgram>count</sgram> <dt>:the most important street of a U.S. town where there are many stores, banks, etc. <un>often used as a name <vi>The restaurant is at 257 <it>Main Street</it>.</vi></un></dt> <sn>2</sn> <bnote>Main Street</bnote> <sgram>noncount</sgram> <ssl>US</ssl> <dt><un>used to refer to middle-class people in the U.S. who have traditional beliefs and values <vi>What does <it>Main Street</it> think of this policy?</vi></un></dt></def></entry>
      <entry id="water main"> <hw>water main</hw> <fl>noun</fl> <in><il>plural</il> <if>⁓ mains</if></in> <def><gram>count</gram> <dt>:a large underground pipe that carries water <vi>The <it>water main</it> burst/broke and flooded the street.</vi></dt></def></entry>
  </entry_list>

我的问题是,我是否必须定义要转换成的 Java 对象?如果是这种情况,我担心的是如果在 XML 响应中添加或删除数据会发生什么,因为它存在。 我也试过将 XML 加载到 DOM 中并以这种方式走过它,但我再次想知道如果添加或删除元素会发生什么?
如果某些子节点的父节点是某个值,我只想要某些子节点,因此欢迎任何关于最简单方法的指针。

通常称为 POJO,是的,拥有一个是个好主意(甚至可能是必要的)。它定义了您的数据应如何表示为对象。如果数据缺失,Java 对象的字段将为空。因此,您应该将 Java 对象定义为所有可能属性的最大覆盖。

可能有一些库会将额外的属性放入哈希图中(至少我知道 Jackson 可以为 JSON 做到这一点,不确定 XML)

唯一的办法就是自己手动解析,保证捕获到所有元素,比如深度优先遍历节点

您可以使用 SAX 解析器。这种方法的美妙之处在于,除了快速和低内存之外,你可以忽略所有你不想要或不需要的东西——然后你不在乎那些部分是否改变。您只需在标签通过时捕捉到您想要的标签。

例如,如果您只对 "main clause" 标签感兴趣,您的处理程序将类似于:

public class MyHandler extends org.xml.sax.helpers.DefaultHandler {

    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        if ("entry".equalsIgnoreCase(localName) &&
                "main clause".equalsIgnoreCase(attributes.getValue("id"))) {
            // Set a member variable flag
            // So we know how to process nested tags
        }
    }

    public void endElement(String uri, String localName, String qName) throws SAXException {
        if ("entry".equalsIgnoreCase(localName)) {
            // Unset the flag
        }
    }
}

使用 XML 的最简单方法是将其序列化为一个对象。
您可以使用 JAXB 来完成,这里有一个教程:mykong
只需定义对象的外观即可。
这是一个例子:

@XmlRootElement(name = "entry_list")
public class EntryList {

    @XmlElement(name = "entry")
    private List<Entry> entities;

    public List<Entry> getEntities() {
        return entities;
    }
    public void setLastName(List<Entry> entities) {
        this.entities = entities;
    }
}

public class Entry {

    @XmlAttribute
    private String id;

    @XmlElement
    private Sound sound

    etc
    ...

    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }

    public Sound getSound() {
        return sound;
    }
    public void setSound(Sound sound) {
        this.sound = sound;
    }
}

每个有子元素的元素都必须是 class,如果它重复多次,如 entryvi 应该是一个列表。

根据我的经验,当您必须处理非常复杂的 XML 文档时,可能更容易:

  1. 将其转化为更简单的形式
  2. 将其编组为您可以使用的对象

即假设你有一个非常复杂的 XML:

<XML>
   <SomeElement>
       <MoreElements>
           <EvenMoreElements>text</EvenMoreElements>
       </MoreElements>
   </SomeElement>
</XML>

第 1 步:使用 XSLT 简化它

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <SimpleForm><xsl:value-of select="XML/SomeElement/MoreElements/EvenMoreElements/text()"/></SimpleForm>
    </xsl:template>
</xsl:stylesheet>

第 2 步。将您自己的 SimpleForm XML 编组为 java 对象

这样你就松散了外部模式和内部逻辑之间的耦合。

我认为 JAXB 不是这里最好的解决方案...最好的解决方案是基于 XPath,它允许您在不牺牲代码可维护性的情况下简化编码...正如您在下面的代码中看到的那样,您的导航只是一个 XPATH 表达式,整个程序大约有 10 多行代码使用了 XPath 和 VTD-XML,顺便说一句,您上面发布的 xml 示例格式不正确...

import com.ximpleware.*;
public class extractExample {

    public static void main(String[] args) throws VTDException {
        // TODO Auto-generated method stub
        VTDGen vg = new VTDGen();
        if(!vg.parseFile("d:\xml\sample.xml", false)){
            return;
        }
        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        ap.selectXPath("/entry_list/entry/hw[following-sibling::fl='value']/text()");
        int i=0;
        while((i=ap.evalXPath())!=-1){
            System.out.println(" hw value are "+vn.toNormalizedString(i));
        }
    }

}