EDI 到 XML 巨大的文件转换

EDI to XML Huge file conversions

我正在将 EDI 文件转换为 XML。但是,我的输入文件恰好也在 BIF 中,大约为 100Mb,这给我一个 JAVA 内存不足错误。

我试图查阅 Smook 的文档以了解大文件转换,但它是从 XML 到 EDI 的转换。

下面是 运行 我的 main

时我得到的响应
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
        at java.lang.StringBuffer.append(StringBuffer.java:367)
        at java.io.StringWriter.write(StringWriter.java:94)
        at java.io.Writer.write(Writer.java:127)
        at freemarker.core.TextBlock.accept(TextBlock.java:56)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visitByHiddingParent(Environment.java:278)
        at freemarker.core.IteratorBlock$Context.runLoop(IteratorBlock.java:157)
        at freemarker.core.Environment.visitIteratorBlock(Environment.java:501)
        at freemarker.core.IteratorBlock.accept(IteratorBlock.java:67)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Macro$Context.runMacro(Macro.java:173)
        at freemarker.core.Environment.visit(Environment.java:686)
        at freemarker.core.UnifiedCall.accept(UnifiedCall.java:80)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.MixedContent.accept(MixedContent.java:57)
        at freemarker.core.Environment.visit(Environment.java:257)
        at freemarker.core.Environment.process(Environment.java:235)
        at freemarker.template.Template.process(Template.java:262)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:92)
        at org.milyn.util.FreeMarkerTemplate.apply(FreeMarkerTemplate.java:86)
        at org.milyn.event.report.HtmlReportGenerator.applyTemplate(HtmlReportGenerator.java:76)
        at org.milyn.event.report.AbstractReportGenerator.processFinishEvent(AbstractReportGenerator.java:197)
        at org.milyn.event.report.AbstractReportGenerator.processLifecycleEvent(AbstractReportGenerator.java:157)
        at org.milyn.event.report.AbstractReportGenerator.onEvent(AbstractReportGenerator.java:92)
        at org.milyn.Smooks._filter(Smooks.java:558)
        at org.milyn.Smooks.filterSource(Smooks.java:482)
        at com.***.xfunctional.EdiToXml.runSmooksTransform(EdiToXml.java:40)
        at com.***.xfunctional.EdiToXml.main(EdiToXml.java:57)

import java.io.*;
import java.util.Arrays;
import java.util.Locale;
import javax.xml.transform.stream.StreamSource;
import org.milyn.Smooks;
import org.milyn.SmooksException;
import org.milyn.container.ExecutionContext;
import org.milyn.event.report.HtmlReportGenerator;
import org.milyn.io.StreamUtils;
import org.milyn.payload.StringResult;
import org.milyn.payload.SystemOutResult;
import org.xml.sax.SAXException;

public class EdiToXml {

  private static byte[] messageIn = readInputMessage();

  protected static String runSmooksTransform() throws IOException, SAXException, SmooksException {

    Locale defaultLocale = Locale.getDefault();
    Locale.setDefault(new Locale("en", "EN"));

    // Instantiate Smooks with the config...
    Smooks smooks = new Smooks("smooks-config.xml");
    try {
      // Create an exec context - no profiles....
      ExecutionContext executionContext = smooks.createExecutionContext();

      StringResult result = new StringResult();

      // Configure the execution context to generate a report...
      executionContext.setEventListener(new HtmlReportGenerator("target/report/report.html"));

      // Filter the input message to the outputWriter, using the execution context...
      smooks.filterSource(executionContext, new StreamSource(new ByteArrayInputStream(messageIn)),result);

      Locale.setDefault(defaultLocale);

      return result.getResult();
    } finally {
      smooks.close();
    }
  }

  public static void main(String[] args) throws IOException, SAXException, SmooksException {
    System.out.println("\n\n==============Message In==============");
    System.out.println("======================================\n");

    pause(
        "The EDI input stream can be seen above.  Press 'enter' to see this stream transformed into XML...");

    String messageOut = EdiToXml.runSmooksTransform();

    System.out.println("==============Message Out=============");
    System.out.println(messageOut);
    System.out.println("======================================\n\n");

    pause("And that's it!  Press 'enter' to finish...");
  }

  private static byte[] readInputMessage() {
    try {
      InputStream input = new BufferedInputStream(new FileInputStream("/home/****/Downloads/BifInputFile.DATA"));
      return StreamUtils.readStream(input);
    } catch (IOException e) {
      e.printStackTrace();
      return "<no-message/>".getBytes();
    }
  }

  private static void pause(String message) {
    try {
      BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
      System.out.print("> " + message);
      in.readLine();
    } catch (IOException e) {
    }
    System.out.println("\n");
  }

}

<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:edi="http://www.milyn.org/xsd/smooks/edi-1.4.xsd">
  <!--
     Configure the EDI Reader to parse the message stream into a stream of SAX events.
     -->
  <edi:reader mappingModel="edi-to-xml-bif-mapping.xml" validate="false"/>
</smooks-resource-list>

我在代码中编辑了这一行以反映流的用法:-

smooks.filterSource(executionContext, new StreamSource(new FileInputStream("/home/***/Downloads/sample-text-file.txt")), result);

但是我现在把下面这个当作错误。有人猜到什么是最好的方法吗?

Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:97)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
    at org.milyn.Smooks._filter(Smooks.java:526)
    at org.milyn.Smooks.filterSource(Smooks.java:482)
    at ****.EdiToXml.runSmooksTransform(EdiToXml.java:41)
    at com.***.***.EdiToXml.main(EdiToXml.java:58)
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:504)
    at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:453)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:428)
    at org.milyn.edisax.EDIParser.parse(EDIParser.java:386)
    at org.milyn.smooks.edi.EDIReader.parse(EDIReader.java:111)
    at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
    at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
    ... 5 more

消息有效,xml 映射良好。我只是没有使用最佳的消息读写方法。

我开始意识到 Smooks 的 filterSource 方法可以直接使用 InputStream 和 OutputStream 作为变量。请在下方找到导致程序高效 运行 而无需经历 JAVA 内存错误的代码段。

//Instantiate a FileInputStream
FileInputStream inputStream = new FileInputStream(inputFileName);

//Instantiate an FileOutputStream
FileOutputStream outputStream = new FileOutputStream(outputFileName);


try {    

  // Filter the input message to the outputWriter...
  smooks.filterSource(new StreamSource(inputStream), new StreamResult(outputStream));

  Locale.setDefault(defaultLocale);

} finally {
  smooks.close();
  inputStream.close();
  outputStream.close();
}

感谢社区。

此致。

我是 Smooks 和 Edifact 解析工具的原作者。 Jason 给我发邮件征求这方面的建议,但我已经多年没有参与其中了,所以不确定我能提供多大帮助。

Smooks 没有将完整的消息读入内存。它通过一个将其转换为 SAX 事件流的解析器对其进行流式传输,使其“看起来像”XML 到它的任何下游。如果这些事件随后用于在男性中构建一个大的 Java 对象模型,那么可能会导致 OOM 错误等。

查看异常消息,看起来 EDIFACT 输入与正在使用的定义文件不匹配。

Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [EDIFACT-BIF-TO-XML][1.0].  Must be a minimum of 1 instances of segment [UNH].  Currently at segment number 1.

那些 EDIFACT 定义文件最初是直接从 EDIFACT 组发布的定义生成的,但我确实记得很多人“调整”了消息格式,这似乎是这里可能发生的事情(因此出现上述错误).一种解决方案是调整 pre-generated 定义以匹配。

我知道在过去的一两年里,Smooks 在这个领域发生了很多变化(使用 Apache Daffodil 进行定义),但我不是谈论这些的最佳人选。您可以尝试使用 Smooks 邮件列表寻求帮助。