如何解决 AWS Glue pyspark 脚本从 DocumentDB 抛出重试写入错误

How to solve AWS Glue pyspark script throwing retryWrite error from DocumentDB

运行 AWS glue 中的代码下方。作业能够从数据库中读取数据,但在写入时失败。

调用 o102.pyWriteDynamicFrame 时出错。命令失败,错误 301:服务器上的 'Retryable writes are not supported':。完整的响应是 {"ok": 0.0, "code": 301, "errmsg": "Retryable writes are not supported", "operationTime": {"$timestamp": {"t": 1647921685, "i": 1}}}

在作业详细信息部分使用目录 DocumentDB 连接

尝试在连接字符串中使用 retryWrite=false,但仍然出现错误


documentdb_uri = "mongodb://<host name>:27017"
documentdb_write_uri = "mongodb://<host name>:27017"

read_docdb_options = {
    "uri": documentdb_uri,
    "database": "test",
    "collection": "profiles",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

write_documentdb_options = {
    "uri": documentdb_write_uri,
    "database": "test",
    "collection": "collection1",
    "username": "<username>",
    "password": "<password>",
    "ssl": "true",
    "ssl.domain_match": "false"
}

# Get DynamicFrame from DocumentDB
dynamic_frame2 = glueContext.create_dynamic_frame.from_options(connection_type="documentdb",
                                                               connection_options=read_docdb_options)

# Write DynamicFrame to DocumentDB
glueContext.write_dynamic_frame.from_options(dynamic_frame2, connection_type="documentdb",
                                             connection_options=write_documentdb_options)

job.commit()

正确的选项是 retryWrites=false 并且需要在 uri 的末尾。

你的情况:documentdb_write_uri = "mongodb://<host name>:27017/?retryWrites=false"

通过将 Glue 版本从 3.0 降级到 2.0 解决了这个问题。 在 3.0 中,使用动态帧时无法设置 retryWrite 设置。

已在他们的论坛中创建了一个工单,但尚未解决。 AWS 板中的问题供参考 - https://github.com/awslabs/aws-glue-libs/issues/111 [调用 o365.pyWriteDynamicFrame 时发生错误。命令失败,错误 301:服务器上的 'Retryable writes are not supported' ****.*****.docdb.amazonaws.com:27017.]