为什么在调用 AWS Glue 书签的转换和接收器操作时需要设置“transformation_ctx”参数？

Why do I need to set the `transformation_ctx` parameter when calling transformation and sink operations for AWS Glue bookmark to work?

AWS Glue 书签文档 (https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html) 似乎建议必须将 transformation_ctx 参数传递给源、转换和接收器操作，书签才能工作。这反映在该页面的示例代码中，其中所有 create_dynamic_frame.from_catalog()、ApplyMapping.apply() 和 write_dynamic_frame.from_options() 的调用都通过 transformation_ctx 值传递。

我可以理解将这样的 transformation_ctx 传递给 create_dynamic_frame.from_catalog() 方法的意义，因为 AWS Glue 需要存储有关已在给定 [=11 下的书签中读取的文件的信息=]键。

但是，我不明白为什么这对于 ApplyMapping.apply() 和 write_dynamic_frame.from_options() 这样的方法也是必要的。换句话说，这些操作需要存储在书签中的状态信息是什么？如果我不把transformation_ctx传给这些方法，会造成什么问题？

几个月前（2019 年 10 月）我对书签有同样的疑问，由于亚马逊提供的文档不是很清楚，我打开了一个支持案例以了解更多它是如何实现的。

在我的 Glue Job 中有：

从 S3 读取函数 (glue_context.create_dynamic_frame.from_options)
一个ResolveChoice.apply
Redshift 写入函数 (glue_context.write_dynamic_frame.from_jdbc_conf)

所有这些操作都有 transformation_ctx 值，我测试了不同的可能行为（相同 transformation_ctx 所有，不同，固定值，动态值 ecc）。

在与 AWS 支持人员联系后，他们确认书签仅适用于读取功能（他们还说只有 S3 作为源，但我没有测试），所以我问 transformation_ctx 在 ResolveChoice 中没有用（也写函数），他们说是！他们确认这没有任何区别。

此外，对于写入功能，它不会改变任何东西，因此没有书签逻辑，如果它之前已经运行，则没有“避免功能”。

为什么在调用 AWS Glue 书签的转换和接收器操作时需要设置“transformation_ctx”参数？

Why do I need to set the `transformation_ctx` parameter when calling transformation and sink operations for AWS Glue bookmark to work?

amazon-web-services

aws-glue