AWS Glue 排除模式

Question

我正在开展一个项目，该项目使用 Glue 3.0 和 PySpark 来处理 S3 存储桶之间的大量数据。这是通过使用 GlueContext.create_dynamic_frame_from_options to read the data from an S3 bucket to a DynamicFrame using the recurse connection option set to True as the data is nested heavily. I only wish to read files which end in meta.json therefore I have set the exclusions filter 来排除任何以 data.csv "exclusions": ['**.{txt, csv}', '**/*.data.csv', '**.data.csv', '*.data.csv'] 结尾的文件来实现的，但是我一直收到以下错误：

An error occurred while calling o90.pyWriteDynamicFrame. Unable to parse file: <filename>.data.csv

是否可以将完整的 S3 uri 记录到输出日志或跟踪 have/have 未处理的文件？即使它包含在排除项中，它仍在尝试解析该文件的原因是什么？

Answer 1

排除项必须是字符串

"exclusions": "[\"**/*.txt\", \"**/*.csv\"]",

AWS Glue 排除模式

AWS Glue Exclude Patterns

amazon-s3

amazon-web-services

aws-glue

aws-glue-spark