使用 aws S3 中的所有文件,其中 deleteAfterRead = false
Consume all files in aws S3 where deleteAfterRead = false
有没有办法在不从 S3 中删除文件的情况下使用 S3 存储桶中的所有文件(在 S3 中,大约有 15,000 个文件)?
随着aws-s3中noop参数的滞后,下面的配置有问题。那个问题是:它不断地一遍又一遍地检索相同的 5 个文件。
<endpoint id="fbPage" uri="aws-s3://bucket?amazonS3Client=#aws-credential&deleteAfterRead=false&maxMessagesPerPoll=5&prefix=dev/facebook/page"/>
<route id="consumeS3FbPage">
<from uri="ref:fbPage"/>
<choice>
<when>
<simple>${header.CamelAwsS3ContentLength} > 0</simple>
<log message="Page File detected: ${header.CamelAwsS3Key}"/>
<bean ref="dfaReportingRePull" method="s3toElasticFormat"/>
<setHeader headerName="CamelHttpMethod">
<constant>POST</constant>
</setHeader>
<to uri="http://localhost:9200/fb_camel/page/_bulk"/>
<log message="Success"/>
</when>
<when>
<simple>${header.CamelAwsS3ContentLength} == 0</simple>
<log message="Empty content, Probably the s3 key Folder itself: ${header.CamelAwsS3Key}"/>
</when>
</choice>
</route>
以下日志显示同一文件被多次检索:
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,904 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/05/31/9c9537e6-12a3-415e-aa3d-a450011008be.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,993 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,994 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/01/97d85443-74af-4d64-9808-a4500110117a.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,002 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,002 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/02/223410b2-b4ce-4b7f-8e47-a45001101254.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,010 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,011 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/03/e5c21710-d764-453d-9736-a4500110132e.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,019 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,019 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/04/851d3759-0c35-4679-838c-a4500110140b.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,027 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,375 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/05/31/9c9537e6-12a3-415e-aa3d-a450011008be.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,396 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,397 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/01/97d85443-74af-4d64-9808-a4500110117a.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,409 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,410 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/02/223410b2-b4ce-4b7f-8e47-a45001101254.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,419 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,420 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/03/e5c21710-d764-453d-9736-a4500110132e.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,429 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,430 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/04/851d3759-0c35-4679-838c-a4500110140b.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,439 INFO consumeS3FbPage - Success
即使我使用幂等,它也只是检测到所有 5 个文件都是重复的,因此会被忽略。
我想知道我是否执行 deleteAfterRead,然后放回去是否有效?不,当我查看 http://camel.465427.n5.nabble.com/camel-aws-s3-get-only-files-I-need-td5714095.html 中的代码时,似乎代码只会循环遍历来自 aws s3 的当前 return 列表中的列表。
当我查看代码 ListObjectsRequest.java 时,我看到有一种方法可以定义一个标记,它指示最后处理的 s# 键。有没有办法通过 Camel Spring DSL 设置这个市场? [更新]没有。
深入研究代码后,我找到了造成这种情况的根本原因。并且可以通过此 JiRA 票证进行跟踪:https://issues.apache.org/jira/browse/CAMEL-8431
注意:Camel 版本为 2.14.0
根据 Apache 提交者 Willem Jiang 的说法,此修复将包含在 2.14.3 版中。 Camel-8431
有没有办法在不从 S3 中删除文件的情况下使用 S3 存储桶中的所有文件(在 S3 中,大约有 15,000 个文件)?
随着aws-s3中noop参数的滞后,下面的配置有问题。那个问题是:它不断地一遍又一遍地检索相同的 5 个文件。
<endpoint id="fbPage" uri="aws-s3://bucket?amazonS3Client=#aws-credential&deleteAfterRead=false&maxMessagesPerPoll=5&prefix=dev/facebook/page"/>
<route id="consumeS3FbPage">
<from uri="ref:fbPage"/>
<choice>
<when>
<simple>${header.CamelAwsS3ContentLength} > 0</simple>
<log message="Page File detected: ${header.CamelAwsS3Key}"/>
<bean ref="dfaReportingRePull" method="s3toElasticFormat"/>
<setHeader headerName="CamelHttpMethod">
<constant>POST</constant>
</setHeader>
<to uri="http://localhost:9200/fb_camel/page/_bulk"/>
<log message="Success"/>
</when>
<when>
<simple>${header.CamelAwsS3ContentLength} == 0</simple>
<log message="Empty content, Probably the s3 key Folder itself: ${header.CamelAwsS3Key}"/>
</when>
</choice>
</route>
以下日志显示同一文件被多次检索:
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,904 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/05/31/9c9537e6-12a3-415e-aa3d-a450011008be.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,993 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,994 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/01/97d85443-74af-4d64-9808-a4500110117a.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,002 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,002 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/02/223410b2-b4ce-4b7f-8e47-a45001101254.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,010 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,011 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/03/e5c21710-d764-453d-9736-a4500110132e.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,019 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,019 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/04/851d3759-0c35-4679-838c-a4500110140b.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,027 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,375 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/05/31/9c9537e6-12a3-415e-aa3d-a450011008be.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,396 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,397 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/01/97d85443-74af-4d64-9808-a4500110117a.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,409 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,410 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/02/223410b2-b4ce-4b7f-8e47-a45001101254.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,419 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,420 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/03/e5c21710-d764-453d-9736-a4500110132e.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,429 INFO consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,430 INFO consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/04/851d3759-0c35-4679-838c-a4500110140b.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,439 INFO consumeS3FbPage - Success
即使我使用幂等,它也只是检测到所有 5 个文件都是重复的,因此会被忽略。
我想知道我是否执行 deleteAfterRead,然后放回去是否有效?不,当我查看 http://camel.465427.n5.nabble.com/camel-aws-s3-get-only-files-I-need-td5714095.html 中的代码时,似乎代码只会循环遍历来自 aws s3 的当前 return 列表中的列表。
当我查看代码 ListObjectsRequest.java 时,我看到有一种方法可以定义一个标记,它指示最后处理的 s# 键。有没有办法通过 Camel Spring DSL 设置这个市场? [更新]没有。
深入研究代码后,我找到了造成这种情况的根本原因。并且可以通过此 JiRA 票证进行跟踪:https://issues.apache.org/jira/browse/CAMEL-8431
注意:Camel 版本为 2.14.0
根据 Apache 提交者 Willem Jiang 的说法,此修复将包含在 2.14.3 版中。 Camel-8431