SAX 解析异常; systemId:实体的累积大小超出界限
SAXParseException; systemId: cumulative size of entities exceeds bound
早上好,
我必须在 Java 中解析一个巨大的 xml 文件 (2GB)。它有很多标签,但我只需要在一个普通文件中每次写<title>
和<subtext>
两个标签的内容,所以我使用SaxParse
到目前为止,我已经设法在输出文件中写入了 1M95 文本,这时出现了这个异常:
org.xml.sax.SAXParseException; systemId: filePath; lineNumber: x; columnNumber: y; JAXP00010004 : La taille cumulée des entités est "50 000 001" et dépasse la limite de "50 000 000" définie par "FEATURE_SECURE_PROCESSING".
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1465)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.checkEntityLimit(XMLScanner.java:1544)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.handleCharacter(XMLDocumentFragmentScannerImpl.java:1940)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1866)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3058)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:504)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:328)
at Parsing.main(Class.java:38)
异常的翻译是这样的:
The cumulative size of the entities is "50 000 001" which exceeds the boundary of "50 000 000" defined by "FEATURE_SECURE_PROCESSING".
这是我写的代码:
public class Parsing {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
try {
File inputFile = new File(System.getProperty("user.dir") + "/input.xml");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void doThingOne(String text, String title) throws IOException {
// Write the text and the title on a file
}
public static void doThingTwo(String text, String title) throws IOException {
//Write the text and the title on another file
}
class UserHandler extends DefaultHandler {
boolean bText = false;
boolean bTitle = false;
StringBuffer tagTextBuffer;
StringBuffer tagTitleBuffer;
String text = null;
String title = null;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("title")) {
tagTitleBuffer = new StringBuffer();
bTitle = true;
} else if (qName.equalsIgnoreCase("text")) {
tagTextBuffer = new StringBuffer();
bText = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("title")) {
bTitle = false;
title = tagTextBuffer.toString();
} else if (qName.equals("text")) {
text = tagTextBuffer.toString();
bText = false;
if (text!=null && title == "One") {
try {
Parsing.doThingOne(page, title);
} catch (IOException e) {
e.printStackTrace();
}
} else if (text != null) {
try {
Parsing.doThingTwo(page, title);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bTitle) {
tagTitleBuffer.append(new String(ch, start, length));
} else if (bText) {
tagTextBuffer.append(new String(ch, start, length));
}
}
}
感谢您的宝贵时间。
限制是为了防止 "billion laughs" 攻击。如果您信任 XML 来源,您可以关闭施加此限制的 SECURE_PROCESSING 功能。
我通常建议使用 Apache Xerces 而不是与 JDK 捆绑的版本。
您的 characters() 方法代码是错误的:text
和 title
元素内容都可以拆分为多个调用来传递,因此您需要累积一个两种情况的缓冲区。
最好知道为什么会达到实体扩展限制。您的文档是否包含大量对小实体的实体引用,或一些对大实体的引用,或者什么?实体引用是否出现在您感兴趣的文档部分?
关闭FEATURE_SECURE_PROCESSING没有效果(Java8)。
要增加限制,请使用:
System.setProperty("jdk.xml.totalEntitySizeLimit", String.valueOf(Integer.MAX_VALUE));
在 SAXParserFactory.newInstance() 之前;
早上好,
我必须在 Java 中解析一个巨大的 xml 文件 (2GB)。它有很多标签,但我只需要在一个普通文件中每次写<title>
和<subtext>
两个标签的内容,所以我使用SaxParse
到目前为止,我已经设法在输出文件中写入了 1M95 文本,这时出现了这个异常:
org.xml.sax.SAXParseException; systemId: filePath; lineNumber: x; columnNumber: y; JAXP00010004 : La taille cumulée des entités est "50 000 001" et dépasse la limite de "50 000 000" définie par "FEATURE_SECURE_PROCESSING".
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1465)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.checkEntityLimit(XMLScanner.java:1544)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.handleCharacter(XMLDocumentFragmentScannerImpl.java:1940)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1866)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3058)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:504)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:328)
at Parsing.main(Class.java:38)
异常的翻译是这样的:
The cumulative size of the entities is "50 000 001" which exceeds the boundary of "50 000 000" defined by "FEATURE_SECURE_PROCESSING".
这是我写的代码:
public class Parsing {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
try {
File inputFile = new File(System.getProperty("user.dir") + "/input.xml");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void doThingOne(String text, String title) throws IOException {
// Write the text and the title on a file
}
public static void doThingTwo(String text, String title) throws IOException {
//Write the text and the title on another file
}
class UserHandler extends DefaultHandler {
boolean bText = false;
boolean bTitle = false;
StringBuffer tagTextBuffer;
StringBuffer tagTitleBuffer;
String text = null;
String title = null;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("title")) {
tagTitleBuffer = new StringBuffer();
bTitle = true;
} else if (qName.equalsIgnoreCase("text")) {
tagTextBuffer = new StringBuffer();
bText = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("title")) {
bTitle = false;
title = tagTextBuffer.toString();
} else if (qName.equals("text")) {
text = tagTextBuffer.toString();
bText = false;
if (text!=null && title == "One") {
try {
Parsing.doThingOne(page, title);
} catch (IOException e) {
e.printStackTrace();
}
} else if (text != null) {
try {
Parsing.doThingTwo(page, title);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bTitle) {
tagTitleBuffer.append(new String(ch, start, length));
} else if (bText) {
tagTextBuffer.append(new String(ch, start, length));
}
}
}
感谢您的宝贵时间。
限制是为了防止 "billion laughs" 攻击。如果您信任 XML 来源,您可以关闭施加此限制的 SECURE_PROCESSING 功能。
我通常建议使用 Apache Xerces 而不是与 JDK 捆绑的版本。
您的 characters() 方法代码是错误的:
text
和title
元素内容都可以拆分为多个调用来传递,因此您需要累积一个两种情况的缓冲区。最好知道为什么会达到实体扩展限制。您的文档是否包含大量对小实体的实体引用,或一些对大实体的引用,或者什么?实体引用是否出现在您感兴趣的文档部分?
关闭FEATURE_SECURE_PROCESSING没有效果(Java8)。 要增加限制,请使用:
System.setProperty("jdk.xml.totalEntitySizeLimit", String.valueOf(Integer.MAX_VALUE));
在 SAXParserFactory.newInstance() 之前;