在 Windows 和 Linux 下与 XML 相关的不同行为

Different XML-related behaviour under Windows and Linux

在开源项目Rultor. In that project there is a unit test com.rultor.agents.IndexesRequestsTest#retrievesIndexFromSibling中,它看起来像这样:

@Test
public void retrievesIndexFromSibling() throws Exception {
    final String first = "first";
    final Talks talks = new Talks.InDir();
    talks.create("", first);
    talks.get(first).modify(
        new Directives().xpath("/talk")
            .add("wire").add("href").set("#3").up().up()
            .add("archive")
            .add("log").attr("id", "3").attr("title", "title3")
            .attr("index", "1").up()
    );
    final String second = "second";
    talks.create("", second);
    talks.get(second).modify(
        new Directives().xpath("/talk")
            .add("wire").add("href").set("#4").up().up()
            .add("archive")
            .add("log").attr("id", "4").attr("title", "title4")
            .attr("index", "2").up()
    );
    final String third = "third";
    talks.create("", third);
    talks.get(third).modify(
        new Directives()
            .xpath("/talk").add("wire").add("href").set("#5").up().up()
            .add("request").attr("id", "a67890")
            .add("args").up()
            .add("type").set("merge").up()
    );
    new IndexesRequests().execute(talks);
    MatcherAssert.assertThat(
        talks.get(third).read(),
        XhtmlMatchers.hasXPaths("/talk/request[@index='5']")
    );
}

现在看看测试结束时的调用talks.get(third).read()

在Windows下,其结果等于

<?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="third" number="1" public="true">
   <wire>
      <href>#5</href>
   </wire>
   <request id="a67890" index="5">
      <args/>
      <type>merge</type>
   </request>
</talk>

在Linux下,talks.get(third).read()的结果等于

<?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="third" number="1" public="true">
   <wire>
      <href>#5</href>
   </wire>
   <request id="a67890" index="3">
      <args/>
      <type>merge</type>
   </request>
</talk>

区别:在Windows下requestindex属性是5,在Linux下-3.

为什么?

生成索引的class是com.rultor.agents.IndexesRequests:

public final class IndexesRequests implements SuperAgent {
    @Override
    public void execute(final Talks talks) throws IOException {
        int idx = this.index(talks);
        for (final Talk talk : talks.active()) {
            idx += 1;
            talk.modify(
                new Directives()
                    .xpath("/talk/request")
                    .attr("index", Integer.toString(idx))
            );
        }
    }

    /**
     * Calculates maximal index value for a {@link Talks} object.
     * @param talks The {@link Talks} object
     * @return The maximal index value
     * @throws IOException if the content of one {@link Talk} object can't be read
     */
    private int index(final Talks talks) throws IOException {
        int index = 0;
        for (final Talk talk : talks.active()) {
            final int idx = this.index(talk);
            if (idx > index) {
                index = idx;
            }
        }
        return index;
    }

    /**
     * Calculates maximal (existing) index value of a {@link Talk} object.
     * @param talk The {@link Talk} object
     * @return The maximal index value
     * @throws IOException if the content of the {@link Talk} object can't be read
     */
    private int index(final Talk talk) throws IOException {
        final Iterable<Integer> indexes = Iterables.transform(
                talk.read()
                .xpath("/talk/archive/log/@index|/talk/request/@index"),
            new Function<String, Integer>() {
                @Override
                public Integer apply(final String input) {
                    return Integer.parseInt(input);
                }
            }
        );
        final int index;
        if (indexes.iterator().hasNext()) {
            index = Ordering.natural().max(indexes);
        } else {
            index = 0;
        }
        return index;
    }
}

除了 index 方法中的 XPath 调用之外,代码中是否还有任何其他地方可能在 Windows 和 Linux 下表现不同?

更新 1 (23.01.2015 14:47 MSK): 如果我添加

System.out.println("Index first: " +
    new IndexesRequests().index(talks.get(first)));
System.out.println("Index second: " +
    new IndexesRequests().index(talks.get(second)));
System.out.println("Index third: " +
    new IndexesRequests().index(talks.get(third)));

Linux 和 Windows 的结果相同:

Index first: 1
Index second: 2
Index third: 0

添加这些诊断消息后,在 Linux 下测试不会失败。

更新 2(2015 年 1 月 23 日 15:07 MSK):

我添加了用于打印 talks.active() 结果的代码。

final Iterable<Talk> activeTalks = talks.active();

System.out.println("activeTalks (START)");

for (final Talk talk : activeTalks) {
    System.out.println("Talk XML: " + talk.read().toString());
}

System.out.println("activeTalks (END)");

Windows:

activeTalks (START)
Talk XML: <?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="first" number="1" public="true">
   <wire>
      <href>#3</href>
   </wire>
   <archive>
      <log id="3" index="1" title="title3"/>
   </archive>
</talk>
Talk XML: <?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="second" number="1" public="true">
   <wire>
      <href>#4</href>
   </wire>
   <archive>
      <log id="4" index="2" title="title4"/>
   </archive>
</talk>
Talk XML: <?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="third" number="1" public="true">
   <wire>
      <href>#5</href>
   </wire>
   <request id="a67890">
      <args/>
      <type>merge</type>
   </request>
</talk>
activeTalks (END)

Linux:

activeTalks (START)
Talk XML: <?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="third" number="1" public="true">
   <wire>
      <href>#5</href>
   </wire>
   <request id="a67890">
      <args/>
      <type>merge</type>
   </request>
</talk>
Talk XML: <?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="second" number="1" public="true">
   <wire>
      <href>#4</href>
   </wire>
   <archive>
      <log id="4" index="2" title="title4"/>
   </archive>
</talk>
Talk XML: <?xml version="1.0" encoding="UTF-8"?>
<talk later="false" name="first" number="1" public="true">
   <wire>
      <href>#3</href>
   </wire>
   <archive>
      <log id="3" index="1" title="title3"/>
   </archive>
</talk>
activeTalks (END)

计算@index 的算法取决于talks.active() 传递的这些元素的顺序。如果这个 returns say、HashMap 或 HashSet 的值,它们可能在这个系统上以一个顺序返回,而在那个系统上以另一个顺序返回。

定义 hashCode 可能不会产生可靠的顺序。

一种可靠的方法是维护插入顺序的 Map 或 Set(如 LinkedHash*)。