AWS DMS 任务在 CDC 模式下一段时间后失败
AWS DMS task failing after some time in CDC mode
我在设置将 RDS 数据库(PostgreSQL,引擎 10.15)中的数据迁移到初始迁移 + CDC 模式下的 S3 存储桶的任务时遇到问题。
两个端点均已成功配置和测试。
我已经创建了两次任务,两次最多 运行 几个小时,第一次初始转储正常并且也发生了一些增量转储,第二次仅初始转储完成并且在任务失败之前没有执行增量转储。
现在的错误信息是:
Last Error Task 'data-migration-bp-dev' was suspended after 9 successive recovery failures Stop Reason FATAL_ERROR Error Level FATAL_
但就在它第一次失败之后是:
Last Error An internal WAL conversational protocol error has occurred. Task error notification received from subtask 0, thread 0 reptask/replicationtask.c:2859 1020452 Error executing source loop; Stream component failed at subtask 0, component st_0_data-migration-rds-bp-dev; Stream component 'st_0_data-migration-rds-bp-dev' terminated reptask/replicationtask.c:2866 1020452 Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE
在 CloudWatch 日志中,我看到以下错误消息:
SOURCE_CAPTURE I: Streaming initiated successfully (postgres_pglogical.c:274)
SOURCE_CAPTURE I: #1 : Non-monotonic LSN sequence: Current LSN '00000000/00000000' < Previous LSN '000001E3/94016430'. Event is ignored. (postgres_endpoint_wal_engine.c:710)
SOURCE_CAPTURE I: Unable to resolve attributes for relation id '28804'. Aborting action. (postgres_pglogical.c:1643)
SOURCE_CAPTURE I: End of CDC / CAPTURE events for POSTGRES endpoint. (postgres_endpoint_capture.c:520)
SOURCE_CAPTURE I: CAPTURE ended with exceptions. (postgres_endpoint_capture.c:527)
SOURCE_CAPTURE E: Could not find relation id '28804' in hash. 1020483 (postgres_pglogical.c:1470)
SOURCE_CAPTURE E: Failed to parse relation from dml command 1020483 (postgres_pglogical.c:2515)
SOURCE_CAPTURE E: Failed to find relation id on target while processing message from source 1020452 (postgres_endpoint_wal_engine.c:805)
SOURCE_CAPTURE E: WAL stream loop ended abnormally. (STATUS_PROTOCOL_ERROR) 1020452 (postgres_endpoint_wal_engine.c:992)
SOURCE_CAPTURE E: WAL reader terminated with irrecoverable error. 1020452 (postgres_endpoint_capture.c:496)
TASK_MANAGER I: Task - data-migration-bp-dev is in ERROR state, updating starting status to AR_NOT_APPLICABLE (repository.c:5102)
SOURCE_CAPTURE E: Error executing source loop 1020452 (streamcomponent.c:1870)
TASK_MANAGER E: Stream component failed at subtask 0, component st_0_data-migration-rds-bp-dev 1020452 (subtask.c:1409)
SOURCE_CAPTURE E: Stream component 'st_0_data-migration-rds-bp-dev' terminated 1020452 (subtask.c:1578)
TASK_MANAGER E: Task error notification received from subtask 0, thread 0 1020452 (replicationtask.c:2859)
TASK_MANAGER E: Error executing source loop; Stream component failed at subtask 0, component st_0_data-migration-rds-bp-dev; Stream component 'st_0_data-migration-rds-bp-dev' terminated 1020452 (replicationtask.c:2866)
TASK_MANAGER E: Task 'data-migration-bp-dev' encountered a recoverable error, retry attempt # 0 (repository.c:5184)
在这一点上我应该提到,我们必须配置 pglogical 插件并重新启动数据库,但最后我们得到了一个错误,我们忽略了这个错误,因为 DMS 任务在该操作之后启动。
ERROR: current database is not configured as pglogical node
HINT: create pglogical node first
我们的DMS任务失败的问题与pglogical插件配置有关吗?如果是这样,我们如何配置它才能工作(我们的数据库引擎应该与它兼容,不是吗?)?如果没有,如何解决?
提前致谢!
如果以后有人遇到同样的错误,AWS 技术专家告诉我们的是:
pglogical 插件存在一个已知(AWS)问题。该解决方案需要改用 test_decoding 插件。
- 通过指定 pluginName=test_decoding 在 额外的连接属性
- 使用此端点创建新的 DMS 任务(使用旧任务可能会因任务与日志不同步而导致失败)
它确实解决了问题,但我们仍然不知道 DMS 文档中到处都强烈建议的插件(目前)到底是什么问题。
我在设置将 RDS 数据库(PostgreSQL,引擎 10.15)中的数据迁移到初始迁移 + CDC 模式下的 S3 存储桶的任务时遇到问题。 两个端点均已成功配置和测试。 我已经创建了两次任务,两次最多 运行 几个小时,第一次初始转储正常并且也发生了一些增量转储,第二次仅初始转储完成并且在任务失败之前没有执行增量转储。
现在的错误信息是:
Last Error Task 'data-migration-bp-dev' was suspended after 9 successive recovery failures Stop Reason FATAL_ERROR Error Level FATAL_
但就在它第一次失败之后是:
Last Error An internal WAL conversational protocol error has occurred. Task error notification received from subtask 0, thread 0 reptask/replicationtask.c:2859 1020452 Error executing source loop; Stream component failed at subtask 0, component st_0_data-migration-rds-bp-dev; Stream component 'st_0_data-migration-rds-bp-dev' terminated reptask/replicationtask.c:2866 1020452 Stop Reason RECOVERABLE_ERROR Error Level RECOVERABLE
在 CloudWatch 日志中,我看到以下错误消息:
SOURCE_CAPTURE I: Streaming initiated successfully (postgres_pglogical.c:274)
SOURCE_CAPTURE I: #1 : Non-monotonic LSN sequence: Current LSN '00000000/00000000' < Previous LSN '000001E3/94016430'. Event is ignored. (postgres_endpoint_wal_engine.c:710)
SOURCE_CAPTURE I: Unable to resolve attributes for relation id '28804'. Aborting action. (postgres_pglogical.c:1643)
SOURCE_CAPTURE I: End of CDC / CAPTURE events for POSTGRES endpoint. (postgres_endpoint_capture.c:520)
SOURCE_CAPTURE I: CAPTURE ended with exceptions. (postgres_endpoint_capture.c:527)
SOURCE_CAPTURE E: Could not find relation id '28804' in hash. 1020483 (postgres_pglogical.c:1470)
SOURCE_CAPTURE E: Failed to parse relation from dml command 1020483 (postgres_pglogical.c:2515)
SOURCE_CAPTURE E: Failed to find relation id on target while processing message from source 1020452 (postgres_endpoint_wal_engine.c:805)
SOURCE_CAPTURE E: WAL stream loop ended abnormally. (STATUS_PROTOCOL_ERROR) 1020452 (postgres_endpoint_wal_engine.c:992)
SOURCE_CAPTURE E: WAL reader terminated with irrecoverable error. 1020452 (postgres_endpoint_capture.c:496)
TASK_MANAGER I: Task - data-migration-bp-dev is in ERROR state, updating starting status to AR_NOT_APPLICABLE (repository.c:5102)
SOURCE_CAPTURE E: Error executing source loop 1020452 (streamcomponent.c:1870)
TASK_MANAGER E: Stream component failed at subtask 0, component st_0_data-migration-rds-bp-dev 1020452 (subtask.c:1409)
SOURCE_CAPTURE E: Stream component 'st_0_data-migration-rds-bp-dev' terminated 1020452 (subtask.c:1578)
TASK_MANAGER E: Task error notification received from subtask 0, thread 0 1020452 (replicationtask.c:2859)
TASK_MANAGER E: Error executing source loop; Stream component failed at subtask 0, component st_0_data-migration-rds-bp-dev; Stream component 'st_0_data-migration-rds-bp-dev' terminated 1020452 (replicationtask.c:2866)
TASK_MANAGER E: Task 'data-migration-bp-dev' encountered a recoverable error, retry attempt # 0 (repository.c:5184)
在这一点上我应该提到,我们必须配置 pglogical 插件并重新启动数据库,但最后我们得到了一个错误,我们忽略了这个错误,因为 DMS 任务在该操作之后启动。
ERROR: current database is not configured as pglogical node
HINT: create pglogical node first
我们的DMS任务失败的问题与pglogical插件配置有关吗?如果是这样,我们如何配置它才能工作(我们的数据库引擎应该与它兼容,不是吗?)?如果没有,如何解决?
提前致谢!
如果以后有人遇到同样的错误,AWS 技术专家告诉我们的是:
pglogical 插件存在一个已知(AWS)问题。该解决方案需要改用 test_decoding 插件。
- 通过指定 pluginName=test_decoding 在 额外的连接属性
- 使用此端点创建新的 DMS 任务(使用旧任务可能会因任务与日志不同步而导致失败)
它确实解决了问题,但我们仍然不知道 DMS 文档中到处都强烈建议的插件(目前)到底是什么问题。