Apache Camel：逐行处理文件

Question

我有一个大文件需要读取和处理。我不想将它完全加载到内存中，而是单独读取文件的每一行并对其执行操作。我在工作中接触过这个实现：

@Override
public void configure() {
    from(fileSftpLocationUrl)
        .routeId("my-route")
        .onException(Exception.class).handled(true).bean(exchangeErrorHandler, "processError").end()
        .split(body().tokenize("\n")).streaming()
        .filter(/*condition for skip first and last line*/)
        .bean(/*my action*/)
        .to(String.format("activemq:%s", myQueue));
}

在开始阅读文件之前，我跳过了页眉和页脚 .filter(/*condition for skip first and last line*/)，在下一行中，我尝试开始逐行阅读我的文件 .split(body().tokenize("\n")).streaming() 但是出了点问题，我完整地从文件中获取所有信息。我在解析该数据并对它们执行操作时在 .bean(/*my action*/) 中看到了这个问题。

我认为我的问题在一开始就被隐藏了，因为算法看起来很奇怪，首先我描述了整个文件的条件（跳过页眉和页脚），然后我让骆驼逐行处理，并且只然后是特定行的操作。

我的问题是，如何更改此实现以便逐行处理文件？

Answer 1

我想我明白了。默认情况下，拆分结果发送到 FIRST 下一个端点

from(...)
    .split(body().tokenize("\n")).streaming()
    .to("direct:processLine")

如果要发送到复杂的路由，则必须标记拆分结束，例如

from(...)
       
     .split(body().tokenize("\n")).streaming()
        .filter(/*condition for skip first and last line*/)
        .bean(/*my action*/)
        .to(String.format("activemq:%s", myQueue))
     .end()
     .log("Split done");

如果省略 end()，逻辑将是这个（见缩进）：

from(...)
       
     .split(body().tokenize("\n")).streaming()
        .filter(/*condition for skip first and last line*/)
     .end() // Implicit 
     .bean(/*my action*/)
     .to(String.format("activemq:%s", myQueue))

-> 在您的尝试中，使用原始消息调用了 bean(...)（在执行拆分后）

将其视为一种“for 循环”

for (String line: lines) 
filter(line);
bean.run(line);
sendto(...);

完全不同于：

for (String line: lines) {
  filter(line);
  bean.run();
  sendto(...);
}

Apache Camel：逐行处理文件

Apache Camel: Process file line by line

java

apache-camel

camel-ftp

spring-camel