JAXP saxon-he:解析错误后 XMLfile StreamSource 不释放文件访问权限
JAXP saxon-he : XMLfile StreamSource doesn't release file access after parsing error
我正在使用 JAXP 规范 API 结合 Saxon-HE API,主要目的是开发一个应用程序,它使用可配置的 XSLT 样式表转换 XML 文件,能够覆盖生成的输出文档。我跳过了细节,因为我创建了一个示例项目来说明遇到的问题:
用例: 在转换错误的情况下,将 xml 文件移动到另一个目录(可能是错误目录)会引发访问异常。
当我基于 File 实例(指向 XML 文件)实例化 StreamSource 时,如果发生某些解析错误,移动文件会引发 "The process cannot access the file because it is being used by another process." 异常。
这是我写的一个 main-single-class 应用来说明这个问题:
package com.sample.xslt.application;
import net.sf.saxon.Configuration;
import net.sf.saxon.lib.FeatureKeys;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.stream.StreamSource;
public class XsltApplicationSample {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
throw new RuntimeException("Two arguments are expected : <xslFilePath> <inputFilePath>");
}
String xslFilePath = args[0];
String xmlFilePath = args[1];
TransformerFactory factory = TransformerFactory.newInstance();
factory.setAttribute(FeatureKeys.ALLOW_MULTITHREADING, Boolean.TRUE);
factory.setAttribute(FeatureKeys.RECOVERY_POLICY,
new Integer(Configuration.RECOVER_WITH_WARNINGS));
Source xslSource = new StreamSource(new File(xslFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Transformer transformer = factory.newTransformer(xslSource);
try {
transformer.transform(xmlSource, new DOMResult());
} catch (TransformerException e) {
System.out.println(e.getMessage());
}
// move input file to tmp directory (for example, could be configured error dir)
File srcFile = Paths.get(xmlFilePath).toFile();
File tempDir = new File(System.getProperty("java.io.tmpdir"));
Path destFilePath = new File(tempDir, srcFile.getName()).toPath();
try {
Files.move(srcFile.toPath(), destFilePath, StandardCopyOption.REPLACE_EXISTING);
} catch (SecurityException | IOException e) {
System.out.println(e.getMessage());
}
}
}
配置的 xslt 转换文件内容必须有效才能重现。
如果输入 xml 文件为空,则会产生 transformation/parsing 错误,但不会发生访问文件错误。
要重现的输入文件示例:
<root>
<elem>
</root>
标准输出示例:
JAXP: find factoryId =javax.xml.transform.TransformerFactory
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.DocumentBuilderFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
Error on line 3 column 3 of input_err.xml:
SXXP0003: Error reported by XML parser: The element type "elem" must be terminated by the
matching end-tag "</elem>".
org.xml.sax.SAXParseException; systemId: file:/C:/<path>/input_err.xml; lineNumber: 3; columnNumber: 3; The element type "elem" must be terminated by the matching end-tag "</elem>".
C:\<path>\input_err.xml -> C:\<path>\AppData\Local\Temp\input_err.xml: The process cannot access the file because it is being used by another process.
使用命令行(我使用 Eclipse):
java ... -Djaxp.debug=1 -Dfile.encoding=UTF-8 -classpath <...> com.sample.xslt.application.XsltApplicationSample C:\<path>\transform.xsl C:\<path>\input_err.xml
用过 pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.sample</groupId>
<artifactId>XsltExampleProject</artifactId>
<version>1.0.0-SNAPSHOT</version>
<name>XsltExampleProject</name>
<description>XSLT example project</description>
<dependencies>
<dependency>
<groupId>net.sf.saxon</groupId>
<artifactId>Saxon-HE</artifactId>
<version>9.7.0-7</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
我使用的解决方法是将 xml 输入文件的内容作为字符串加载到内存中,请参见以下内容:
String xmlContent = FileUtils.readFileToString(new File(xmlFilePath), StandardCharsets.UTF_8);
Source xslSource = new StreamSource(new File(xslFilePath));
Source xmlSource = new StreamSource(new StringReader(xmlContent));
我在初始化 Transformer 时是否遗漏了什么?
默认解析的 SAX 解析器应该被 Saxon 推荐的另一个 API 覆盖?我认为 Xerces 解析器是根据调试日志记录使用的,但是它是否与 Saxon 提供的转换器实现完全兼容?
我对这个有点困惑..
感谢您的帮助!
从问题后面的评论线程来看,它似乎是 JDK 随附的 XML 解析器中的 bug/defect。您的选择是:
(a) 报告错误并耐心等待修复
(b) 改用 Apache Xerces 解析器
(c) 不提供文件,而是提供文件输入流,然后自行关闭。
我的建议是 (b),因为 Apache Xerces 解析器比 JDK 中的版本可靠得多。
我正在使用 JAXP 规范 API 结合 Saxon-HE API,主要目的是开发一个应用程序,它使用可配置的 XSLT 样式表转换 XML 文件,能够覆盖生成的输出文档。我跳过了细节,因为我创建了一个示例项目来说明遇到的问题:
用例: 在转换错误的情况下,将 xml 文件移动到另一个目录(可能是错误目录)会引发访问异常。
当我基于 File 实例(指向 XML 文件)实例化 StreamSource 时,如果发生某些解析错误,移动文件会引发 "The process cannot access the file because it is being used by another process." 异常。
这是我写的一个 main-single-class 应用来说明这个问题:
package com.sample.xslt.application;
import net.sf.saxon.Configuration;
import net.sf.saxon.lib.FeatureKeys;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.stream.StreamSource;
public class XsltApplicationSample {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
throw new RuntimeException("Two arguments are expected : <xslFilePath> <inputFilePath>");
}
String xslFilePath = args[0];
String xmlFilePath = args[1];
TransformerFactory factory = TransformerFactory.newInstance();
factory.setAttribute(FeatureKeys.ALLOW_MULTITHREADING, Boolean.TRUE);
factory.setAttribute(FeatureKeys.RECOVERY_POLICY,
new Integer(Configuration.RECOVER_WITH_WARNINGS));
Source xslSource = new StreamSource(new File(xslFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Transformer transformer = factory.newTransformer(xslSource);
try {
transformer.transform(xmlSource, new DOMResult());
} catch (TransformerException e) {
System.out.println(e.getMessage());
}
// move input file to tmp directory (for example, could be configured error dir)
File srcFile = Paths.get(xmlFilePath).toFile();
File tempDir = new File(System.getProperty("java.io.tmpdir"));
Path destFilePath = new File(tempDir, srcFile.getName()).toPath();
try {
Files.move(srcFile.toPath(), destFilePath, StandardCopyOption.REPLACE_EXISTING);
} catch (SecurityException | IOException e) {
System.out.println(e.getMessage());
}
}
}
配置的 xslt 转换文件内容必须有效才能重现。 如果输入 xml 文件为空,则会产生 transformation/parsing 错误,但不会发生访问文件错误。
要重现的输入文件示例:
<root>
<elem>
</root>
标准输出示例:
JAXP: find factoryId =javax.xml.transform.TransformerFactory
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.DocumentBuilderFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl using ClassLoader: null
JAXP: find factoryId =javax.xml.parsers.SAXParserFactory
JAXP: loaded from fallback value: com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
JAXP: created new instance of class com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl using ClassLoader: null
Error on line 3 column 3 of input_err.xml:
SXXP0003: Error reported by XML parser: The element type "elem" must be terminated by the
matching end-tag "</elem>".
org.xml.sax.SAXParseException; systemId: file:/C:/<path>/input_err.xml; lineNumber: 3; columnNumber: 3; The element type "elem" must be terminated by the matching end-tag "</elem>".
C:\<path>\input_err.xml -> C:\<path>\AppData\Local\Temp\input_err.xml: The process cannot access the file because it is being used by another process.
使用命令行(我使用 Eclipse):
java ... -Djaxp.debug=1 -Dfile.encoding=UTF-8 -classpath <...> com.sample.xslt.application.XsltApplicationSample C:\<path>\transform.xsl C:\<path>\input_err.xml
用过 pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.sample</groupId>
<artifactId>XsltExampleProject</artifactId>
<version>1.0.0-SNAPSHOT</version>
<name>XsltExampleProject</name>
<description>XSLT example project</description>
<dependencies>
<dependency>
<groupId>net.sf.saxon</groupId>
<artifactId>Saxon-HE</artifactId>
<version>9.7.0-7</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
</plugins>
</build>
</project>
我使用的解决方法是将 xml 输入文件的内容作为字符串加载到内存中,请参见以下内容:
String xmlContent = FileUtils.readFileToString(new File(xmlFilePath), StandardCharsets.UTF_8);
Source xslSource = new StreamSource(new File(xslFilePath));
Source xmlSource = new StreamSource(new StringReader(xmlContent));
我在初始化 Transformer 时是否遗漏了什么? 默认解析的 SAX 解析器应该被 Saxon 推荐的另一个 API 覆盖?我认为 Xerces 解析器是根据调试日志记录使用的,但是它是否与 Saxon 提供的转换器实现完全兼容? 我对这个有点困惑..
感谢您的帮助!
从问题后面的评论线程来看,它似乎是 JDK 随附的 XML 解析器中的 bug/defect。您的选择是:
(a) 报告错误并耐心等待修复
(b) 改用 Apache Xerces 解析器
(c) 不提供文件,而是提供文件输入流,然后自行关闭。
我的建议是 (b),因为 Apache Xerces 解析器比 JDK 中的版本可靠得多。