使用 Transformer 处理空 CDATA 时出现 IndexOutOfBoundsException
IndexOutOfBoundsException when processing empty CDATA with Transformer
我想从大型 XML 文件中提取特定节点。效果很好,直到出现没有任何内容的疯狂 CDATA。
输出:
ERROR: ''
javax.xml.transform.TransformerException: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:732)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
Caused by: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
... 3 more
---------
java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
代码:
InputStream stream = new FileInputStream("C:\myFile.xml");
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(stream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
String extractPath = "/root";
String path = "";
while(reader.hasNext()) {
reader.next();
if(reader.isStartElement()) {
path += "/" + reader.getLocalName();
if(path.equals(extractPath)) {
StringWriter writer = new StringWriter();
StAXSource src = new StAXSource(reader);
StreamResult res = new StreamResult(writer);
t.transform(src, res); // Exception thrown
System.out.println(writer.toString());
path = path.substring(0, path.lastIndexOf("/"));
}
}
else if(reader.isEndElement()) {
path = path.substring(0, path.lastIndexOf("/"));
}
}
引发错误的XML:
<foo><![CDATA[]]></foo>
我可以让 Transformer
忽略它吗?或者另一个实现会是什么样子?我无法更改输入 XML!
这是 Xerces 实施的问题,请检查:
https://issues.apache.org/jira/browse/XERCESJ-1033
似乎空的CDATA不应该存在,所以我能给你的唯一建议是:
- 更改 XML 解析器实现
- 从源文件中删除空的 CDATA(将“
<![CDATA[]]>
”替换为“”)
或者在 CDATA 中放置一个空格,例如<![CDATA[ ]]>
我添加了一些其他实现的示例。
Jaxb
在 Jaxb 中,您可以通过简单的方式将 XML 映射到 POJO。
例如,如果您在 c:\myFile.xml 中有下一个 xml 文件:
<root>
<foo><![CDATA[]]></foo>
<foo><![CDATA[some data here]]></foo>
</root>
您可以拥有下一个 POJO:
@XmlRootElement
public class Root {
@XmlElement(name="foo")
privateList<Foo> foo;
public List<Foo> getFooList() {
return foo;
}
public void setFooList(List<Foo> fooList) {
this.foo = fooList;
}
}
@XmlType(name = "foo")
public class Foo {
@XmlValue
private String content;
@Override
public String toString() {
return content;
}
}
然后使用下一个片段从 XML 解析为对象:
public static void main(String[] args) {
try {
File file = new File("C:\myFile.xml");
JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Root root = (Root) jaxbUnmarshaller.unmarshal(file);
for (Foo foo : root.getFooList()) {
System.out.println(String.format("Foo content: |%s|", foo));
}
} catch (JAXBException e) {
e.printStackTrace();
}
}
我对此进行了测试,没有出现任何错误。
我在同一应用程序的两个构建中遇到了这个错误,一个构建在处理空 <![CDATA[]]>
时出现错误,另一个没有。
不同之处在于,损坏的构建使用的是 Xerces(嵌入在 jre 中),而工作构建在类路径上添加了额外的依赖项,https://mvnrepository.com/artifact/org.codehaus.woodstox/woodstox-core-asl。
损坏构建的堆栈跟踪的相关部分将是
java.lang.Exception
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1144)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
at javax.xml.validation.Validator.validate(Validator.java:124)
工作构建时
java.lang.Exception
at com.ctc.wstx.sr.BasicStreamReader.getTextCharacters(BasicStreamReader.java:894)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
at javax.xml.validation.Validator.validate(Validator.java:124)
这个 Q/A 帮助我通过 Woodstox What is the relation between fasterxml(jackson-dataformat-xml) and Woodstox? 获得 "comfortable"。
我想从大型 XML 文件中提取特定节点。效果很好,直到出现没有任何内容的疯狂 CDATA。
输出:
ERROR: ''
javax.xml.transform.TransformerException: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:732)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
Caused by: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
... 3 more
---------
java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
代码:
InputStream stream = new FileInputStream("C:\myFile.xml");
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(stream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
String extractPath = "/root";
String path = "";
while(reader.hasNext()) {
reader.next();
if(reader.isStartElement()) {
path += "/" + reader.getLocalName();
if(path.equals(extractPath)) {
StringWriter writer = new StringWriter();
StAXSource src = new StAXSource(reader);
StreamResult res = new StreamResult(writer);
t.transform(src, res); // Exception thrown
System.out.println(writer.toString());
path = path.substring(0, path.lastIndexOf("/"));
}
}
else if(reader.isEndElement()) {
path = path.substring(0, path.lastIndexOf("/"));
}
}
引发错误的XML:
<foo><![CDATA[]]></foo>
我可以让 Transformer
忽略它吗?或者另一个实现会是什么样子?我无法更改输入 XML!
这是 Xerces 实施的问题,请检查: https://issues.apache.org/jira/browse/XERCESJ-1033
似乎空的CDATA不应该存在,所以我能给你的唯一建议是:
- 更改 XML 解析器实现
- 从源文件中删除空的 CDATA(将“
<![CDATA[]]>
”替换为“”)
或者在 CDATA 中放置一个空格,例如<![CDATA[ ]]>
我添加了一些其他实现的示例。
Jaxb
在 Jaxb 中,您可以通过简单的方式将 XML 映射到 POJO。
例如,如果您在 c:\myFile.xml 中有下一个 xml 文件:
<root>
<foo><![CDATA[]]></foo>
<foo><![CDATA[some data here]]></foo>
</root>
您可以拥有下一个 POJO:
@XmlRootElement
public class Root {
@XmlElement(name="foo")
privateList<Foo> foo;
public List<Foo> getFooList() {
return foo;
}
public void setFooList(List<Foo> fooList) {
this.foo = fooList;
}
}
@XmlType(name = "foo")
public class Foo {
@XmlValue
private String content;
@Override
public String toString() {
return content;
}
}
然后使用下一个片段从 XML 解析为对象:
public static void main(String[] args) {
try {
File file = new File("C:\myFile.xml");
JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Root root = (Root) jaxbUnmarshaller.unmarshal(file);
for (Foo foo : root.getFooList()) {
System.out.println(String.format("Foo content: |%s|", foo));
}
} catch (JAXBException e) {
e.printStackTrace();
}
}
我对此进行了测试,没有出现任何错误。
我在同一应用程序的两个构建中遇到了这个错误,一个构建在处理空 <![CDATA[]]>
时出现错误,另一个没有。
不同之处在于,损坏的构建使用的是 Xerces(嵌入在 jre 中),而工作构建在类路径上添加了额外的依赖项,https://mvnrepository.com/artifact/org.codehaus.woodstox/woodstox-core-asl。
损坏构建的堆栈跟踪的相关部分将是
java.lang.Exception
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1144)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
at javax.xml.validation.Validator.validate(Validator.java:124)
工作构建时
java.lang.Exception
at com.ctc.wstx.sr.BasicStreamReader.getTextCharacters(BasicStreamReader.java:894)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
at javax.xml.validation.Validator.validate(Validator.java:124)
这个 Q/A 帮助我通过 Woodstox What is the relation between fasterxml(jackson-dataformat-xml) and Woodstox? 获得 "comfortable"。