XMLStreamReader 如何处理相同类型的嵌套元素

XMLStreamReader how to work with nested elements of same type

我正在使用 XMLStreamReader 并解析以下 XML:

<root>
    <element>
        <attribute>level0</attribute>
        <element>
            <attribute>level1</attribute>
            <element>
                <attribute>level2</attribute>
            </element>
        </element>
    </element>
</root>

我正在构建我的 XMLStreamReader:

XMLStreamReader reader = XMLInputFactory.newInstance().createXMLStreamReader(
                new ByteArrayInputStream(document.getBytes()));

不幸的是,当我到达带有 reader.next(); 的第一个结束元素标记时,出现以下异常:

javax.xml.stream.XMLStreamException: ParseError at [row,col]:[7,14]
Message: XML document structures must start and end within the same entity. 

有没有办法覆盖 XMLStreamReader 的默认行为来解决这个问题?

编辑

这是我正在使用的代码:

@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
        throws IOException, InterruptedException {
    String document = value.toString();
    System.out.println("'" + document + "'");
    try {
        XMLStreamReader reader = XMLInputFactory.newInstance().createXMLStreamReader(
                new ByteArrayInputStream(document.getBytes()));
        String propertyName = "";
        String propertyValue = "";
        String currentElement = "";
        while (reader.hasNext()) {
            int code = reader.next();
            switch (code) {
            case START_ELEMENT:
                currentElement = reader.getLocalName();
                break;
            case CHARACTERS:
                if (currentElement.equalsIgnoreCase("element")) {
                    propertyName += reader.getText();
                } else if (currentElement.equalsIgnoreCase("attribute")) {
                    propertyValue += reader.getText();
                }
                break;
            }
        }
        reader.close();
        context.write(new Text(propertyName.trim()), new Text(propertyValue.trim()));
    } catch (Exception e) {
        e.printStackTrace();
    }
}

示例 XML 文档 and/or StAX 解析器没有任何问题,可以使用以下代码进行检查:

@Test
public void testSO_31815379() throws XMLStreamException, UnsupportedEncodingException {
    final String xml = 
        "<root>\n" +
        "    <element>\n" +
        "        <attribute>level0</attribute>\n" +
        "        <element>\n" +
        "            <attribute>level1</attribute>\n" +
        "            <element>\n" +
        "                <attribute>level2</attribute>\n" +
        "            </element>\n" +
        "        </element>\n" +
        "    </element>\n" +
        "</root>";

    final XMLStreamReader reader = XMLInputFactory.newInstance()
        .createXMLStreamReader(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    LOG.info("Using XMLStreamReader implementation: %s", reader.getClass().getName());

    reader.require(XMLStreamConstants.START_DOCUMENT, null, null);
    int event;
    while ((event = reader.next()) != XMLStreamConstants.END_DOCUMENT) {
        LOG.info(StaxUtils.eventDescription(reader));
    }
    reader.require(XMLStreamConstants.END_DOCUMENT, null, null);
    reader.close();
}

输出(StaxUtils.eventDescription 是自定义辅助方法)

Using XMLStreamReader implementation: com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl
START_ELEMENT<{}root>
CHARACTERS=<whitespace>
START_ELEMENT<{}element>
CHARACTERS=<whitespace>
START_ELEMENT<{}attribute>
CHARACTERS='level0'
END_ELEMENT<attribute>
CHARACTERS=<whitespace>
START_ELEMENT<{}element>
CHARACTERS=<whitespace>
START_ELEMENT<{}attribute>
CHARACTERS='level1'
END_ELEMENT<attribute>
CHARACTERS=<whitespace>
START_ELEMENT<{}element>
CHARACTERS=<whitespace>
START_ELEMENT<{}attribute>
CHARACTERS='level2'
END_ELEMENT<attribute>
CHARACTERS=<whitespace>
END_ELEMENT<element>
CHARACTERS=<whitespace>
END_ELEMENT<element>
CHARACTERS=<whitespace>
END_ELEMENT<element>
CHARACTERS=<whitespace>
END_ELEMENT<root>