不要使用 SimpleXML 库解析 XML

Do not parser XML using SimpleXML library

当我尝试使用 SimpleXML(2.7.1 版)库将 xml 字符串添加到对象时遇到问题。

这是我的字符串:

<tbody>
    <tr>
        <th></th>
        <th>Weather</th>
        <th><img src="http://localhost:8080/img/today" /></th>
        <th><img src="http://localhost:8080/img/today_1" /></th>
        <th><img src="http://localhost:8080/img/today_2" /></th>
    </tr>
    <tr>
        <td><img src="http://localhost:8080/img/location_1" /></td>
        <td> <a href="/Ney-tag/" title="Ney" target="_blank" style="color:#7A2200;">Ney</a> </td>
        <td> 14 </td>
        <td> 15 </td>
        <td> 16 </td>
    </tr>
    <tr>
        <td><img src="http://localhost:8080/img/location_2" /></td>
        <td> <a href="/Pana-tag/" title="Pana" target="_blank" style="color:#7A2200;">Pana</a> </td>
        <td> 30 </td>
        <td> 31 </td>
        <td> 30 </td>
    </tr>
    <tr>
        <td><img src="http://localhost:8080/img/location_3" /></td>
        <td> <a href="/Sin-tag/" title="Sin" target="_blank" style="color:red;font-weight:bold;">Sin</a> </td>
        <td> 32 </td>
        <td> 33 </td>
        <td> 36 </td>
    </tr>
</tbody>

这是我的模型:

@Root
public class TBody {
    @ElementList
    List<TR> tr;
}
public class TR {
    @ElementList(entry = "th")
    List<TH> th;
    @ElementList(entry = "td")
    List<TD> td;
    public TR() {
    }
    public static class TH {
        @Element
        String th;
        @Element
        IMG img;
        public TH() {
            // TODO Auto-generated constructor stub
        }

    }
    public static class IMG{
        @Element
        String img;
        @Attribute
        String src;
        public IMG() {
        }
    }

    public static class TD {
        @Element
        IMG img;
        @Element
        String td;
        @Element
        A a;
    }
    public static class A{
        @Element
        String a;
        @Attribute
        String href;
        @Attribute
        String title;
        @Attribute
        String target;
        @Attribute
        String style;
        public A() {
        }
    }
}

这是 logcat 跟踪:

05-12 23:14:46.711  W/System.err﹕ org.simpleframework.xml.core.ValueRequiredException: 
Unable to satisfy @org.simpleframework.xml.ElementList(data=false, empty=true, entry=td, inline=false, name=, required=true, type=void) on field 'td' java.util.List 
com.example.model.TR.td for class com.example..model.TR at line 3
05-12 23:14:46.711  W/System.err﹕ at org.simpleframework.xml.core.Composite.validate(Composite.java:644)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.readElements(Composite.java:449)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.access0(Composite.java:59)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite$Builder.read(Composite.java:1383)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.read(Composite.java:201)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.read(Composite.java:148)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Traverser.read(Traverser.java:92)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.CompositeList.populate(CompositeList.java:175)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.CompositeList.read(CompositeList.java:120)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.readVariable(Composite.java:623)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.readInstance(Composite.java:573)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.readUnion(Composite.java:549)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.readElement(Composite.java:532)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.readElements(Composite.java:445)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.access0(Composite.java:59)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite$Builder.read(Composite.java:1383)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.read(Composite.java:201)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Composite.read(Composite.java:148)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Traverser.read(Traverser.java:92)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Persister.read(Persister.java:625)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Persister.read(Persister.java:606)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Persister.read(Persister.java:584)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Persister.read(Persister.java:562)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Persister.read(Persister.java:499)
05-12 23:14:46.715  W/System.err﹕ at org.simpleframework.xml.core.Persister.read(Persister.java:408)

谁能帮我理解这里的原因?

谢谢!

[已删除,因为它有错误]

  • 为所有 classes
  • 添加了 @Root 注释
  • 已将 inline=true 添加到所有元素列表
  • 向每个元素列表添加了 type = _.class
  • thtd 添加了 required=false
  • 添加了明确的空构造函数

编辑:必须添加一些 required=false 但更重要的是,SimpleXML 无法处理同一数据对象上的 @Text@Element 标签,所以我使用了 中的解决方案更改 XML 以提供实际的 @Element 而不是 @Text.

我最终正确解析的解决方案如下:

@Root
public class A{
    @Element(required=false)
    String a;
    @Attribute
    String href;
    @Attribute
    String title;
    @Attribute
    String target;
    @Attribute(required=false)
    String style;
    public A() {
    }
}

@Root
public class IMG {
    @Element(required=false)
    String img;
    @Attribute
    String src;
    public IMG() {
    }
}

@Root(name="tbody")
public class TBody {
    @ElementList(entry = "tr", inline=true, type=TR.class)
    List<TR> tr;

    public TBody() {}
}

@Root
public class TD {
    @Element(required=false)
    IMG img;
    @Element(required=false)
    Content content;
    @Element(required=false)
    A a;

    public TD() {}
}

@Root
public class TH {
    @Element(data=true, required=false)
    Content content;

    @Element(name="img", required=false)
    IMG img;

    public TH() {
        // TODO Auto-generated constructor stub
    }

}

@Root
public class TR {
    @ElementList(entry = "th", inline = true, type = TH.class, required = false)
    List<TH> th;
    @ElementList(entry = "td", inline = true, type = TD.class, required = false)
    List<TD> td;

    public TR() {
    }
}

还加了一个class:

@Root
public class Content {
    @Text(required=false)
    String content;
}

而且我不得不使用以下 Persister,类似于链接问题中的内容:

public class SerializerWithPreprocessor extends Persister {

    public SerializerWithPreprocessor() {
    }

    @Override
    public <T> T read(Class<? extends T> type, String source) throws Exception {
        //System.out.println("Source: " + source);
        source = source.replaceAll("<th>([[\s||\w||[+=]]&&[^<>]]+)</th>", "<th><content></content></th>");
        source = source.replaceAll("<td>([[\s||\d||\w||[+=]]&&[^<>]]+)</td>", "<td><content></content></td>");
        //System.out.println("Source2: " + source);
        return super.read(type, source);
    }
}

这样,以下工作:

public class Main {
    private final static String data = "<tbody>\n" +
            "    <tr>\n" +
            "        <th></th>\n" +
            "        <th>Weather</th>\n" +
            "        <th><img src=\"http://localhost:8080/img/today\" /></th>\n" +
            "        <th><img src=\"http://localhost:8080/img/today_1\" /></th>\n" +
            "        <th><img src=\"http://localhost:8080/img/today_2\" /></th>\n" +
            "    </tr>\n" +
            "    <tr>\n" +
            "        <td><img src=\"http://localhost:8080/img/location_1\" /></td>\n" +
            "        <td> <a href=\"/Ney-tag/\" title=\"Ney\" target=\"_blank\" style=\"color:#7A2200;\">Ney</a> </td>\n" +
            "        <td> 14 </td>\n" +
            "        <td> 15 </td>\n" +
            "        <td> 16 </td>\n" +
            "    </tr>\n" +
            "    <tr>\n" +
            "        <td><img src=\"http://localhost:8080/img/location_2\" /></td>\n" +
            "        <td> <a href=\"/Pana-tag/\" title=\"Pana\" target=\"_blank\" style=\"color:#7A2200;\">Pana</a> </td>\n" +
            "        <td> 30 </td>\n" +
            "        <td> 31 </td>\n" +
            "        <td> 30 </td>\n" +
            "    </tr>\n" +
            "    <tr>\n" +
            "        <td><img src=\"http://localhost:8080/img/location_3\" /></td>\n" +
            "        <td> <a href=\"/Sin-tag/\" title=\"Sin\" target=\"_blank\" style=\"color:red;font-weight:bold;\">Sin</a> </td>\n" +
            "        <td> 32 </td>\n" +
            "        <td> 33 </td>\n" +
            "        <td> 36 </td>\n" +
            "    </tr>\n" +
            "</tbody>";



    public static void main(String[] args) {
        Serializer serializer = new SerializerWithPreprocessor();
        try {
            TBody tBody = serializer.read(TBody.class, data);
            serializer.write(tBody, System.out);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

输出如下:

<tbody>
   <tr>
      <th/>
      <th>
         <content>Weather</content>
      </th>
      <th>
         <img src="http://localhost:8080/img/today"/>
      </th>
      <th>
         <img src="http://localhost:8080/img/today_1"/>
      </th>
      <th>
         <img src="http://localhost:8080/img/today_2"/>
      </th>
   </tr>
   <tr>
      <td>
         <img src="http://localhost:8080/img/location_1"/>
      </td>
      <td>
         <a href="/Ney-tag/" title="Ney" target="_blank" style="color:#7A2200;"/>
      </td>
      <td>
         <content> 14 </content>
      </td>
      <td>
         <content> 15 </content>
      </td>
      <td>
         <content> 16 </content>
      </td>
   </tr>
   <tr>
      <td>
         <img src="http://localhost:8080/img/location_2"/>
      </td>
      <td>
         <a href="/Pana-tag/" title="Pana" target="_blank" style="color:#7A2200;"/>
      </td>
      <td>
         <content> 30 </content>
      </td>
      <td>
         <content> 31 </content>
      </td>
      <td>
         <content> 30 </content>
      </td>
   </tr>
   <tr>
      <td>
         <img src="http://localhost:8080/img/location_3"/>
      </td>
      <td>
         <a href="/Sin-tag/" title="Sin" target="_blank" style="color:red;font-weight:bold;"/>
      </td>
      <td>
         <content> 32 </content>
      </td>
      <td>
         <content> 33 </content>
      </td>
      <td>
         <content> 36 </content>
      </td>
   </tr>
</tbody>

您可能需要修改 replaceAll 中的 REGEX 以使其对空白更加友好。