XML 使用 Java 分割大文件

Question

我正在尝试创建一个 java 程序来拆分选定的 XML 文件。

XML 文件数据样本：

<EmployeeDetails>
<Employee>
<FirstName>Ben</FirstName>
</Employee>
<Employee>
<FirstName>George</FirstName>
</Employee>
<Employee>
<FirstName>Cling</FirstName>
</Employee>
<EmployeeDetails>

等等，我有这个 250mb XML 文件，打开它总是很痛苦=] 可以打开这么大的文件）。所以我决定创建一个具有以下功能的 Java 程序： -Select XML 文件（已经完成） -根据标签数量拆分文件，例如。（当前文件有 100k 个标签，我将询问程序用户员工 he/she 想要拆分文件的方式。例如（每个文件 10k） -拆分文件（已完成）

我只是想寻求帮助，了解我如何才能完成第二项任务，已经在 3-4 天内检查我如何才能做到这一点，或者它是否可行（在我看来当然是可行的） .

如有任何回应，我们将不胜感激。

干杯，格林.

Answer 1

假设一个平面结构，其中文档 R 的根元素有大量名为 X 的子元素，则以下 XSLT 2.0 转换将每隔 N 个 X 元素拆分文件。

<t:transform xmlns:t="http://www.w3.org/1999/XSL/Transform"
  version="2.0">
  <t:param name="N" select="100"/>
  <t:template match="/*">
    <t:for-each-group select="X" 
                      group-adjacent="(position()-1) idiv $N">
      <t:result-document href="{position()}.xml">
        <R>
          <t:copy-of select="current-group()"/>
        </R>
      </t:result-document>
   </t:for-each-group>
  </t:template>
</t:transform>

如果您想运行在流模式下（不在内存中构建源代码树），则 (a) 添加 <xsl:mode streamable="yes"/>，以及 (b) 运行使用XSLT 3.0 处理器（Saxon-EE 或 Exselt）。

Answer 2

一个简单的解决方案是有序的。如果 XML 始终具有如图所示的换行符，则不需要 XML 处理。

Path originalPath = Paths.get("... .xml");
try (BufferedReader in = Files.newBufferedReader(originalPath, StandardCharsets.UTF_8)) {
    String line = in.readLine(); // Skip header line(s)

    line = in.readLine();
    for (int fileno; line != null && !line.contains("</EmployeeDetails>"); ++fileno) {
        Path partPath = Paths.get("...-" + fileno + ".xml");
        try (PrintWriter out = new PrintWriter(Files.newBufferedWriter(partPath,
                StandardCharsets.UTF_8))) {
            int counter = 0;
            out.println("<EmployeeDetails>"); // Write header.
            do {
                out.println(line);
                if (line.contains("</Employee>") {
                    ++counter;
                }
                line = in.readLine();
            } while (line != null && !line.contains("</EmployeeDetails>")
                    && counter < 1000);
            out.println("</EmployeeDetails>");
        }
    }
}

XML 使用 Java 分割大文件

XML splitting of BIG file using Java

java

xml

filesplitting