Debezium 与 RDS postgres 和主副本故障转移

Debezium with RDS postgres and master-replica failover

我有一个 RDS 多可用区 postgres 数据库(主-备用),我正在研究 Debezium 以将更改流式传输到 Kafka。

我正在阅读故障转移时可能出现的问题的文档:https://debezium.io/documentation/reference/1.1/connectors/postgresql.html#_cluster_failures 这看起来是一个非常可怕的场景。

从我通过使用故障转移重新启动所做的一些测试来看,似乎当端点从主端点更改为备用端点时,Debezium 连接器会继续工作并自动在备用端点上创建一个复制槽。但据我所知,不能保证不会丢失数据,除非您可以确保在写入新数据之前创建新主数据库(旧备用数据库)上的复制槽。

有人对此设置有经验吗?如果发生故障转移,您如何管理事情?

截至 2022 年,space 使用 Patroni 有一些新的进展,在 Percona 中有描述 博客:How Patroni Addresses the Problem of the Logical Replication Slot Failover in a PostgreSQL Cluster

上述方法的要点:

  • This solution requires PostgreSQL 11 or above because it uses the pg_replication_slot_advance() function which is available from PostgreSQL 11 onwards, for advancing the slot.
  • The downstream connection can use HAProxy so that the connection will be automatically routed to the primary (not covered in this post). No modification to PostgreSQL code or Creation of any extension is required.
  • The copying of the slot happens over PostgreSQL protocol (libpq) rather than any OS-specific tools/methods. Patroni uses rewind or superuser credentials. Patroni uses the pg_read_binary_file() function to read the slot information.
  • Once the logical slot is created on the replica side, Patroni uses pg_replication_slot_advance() to move the slot forward.
  • The permanent slot information will be added to DCS and will be continuously maintained by the primary instance of the Patroni. A New DCS key with the name “status” is introduced and supported across all DCS options (zookeeper, etcd, consul, etc.).
  • hot_standby_feedback must be enabled on all standby nodes where the logical replication slot needs to be maintained.
  • Patroni parameter postgresql.use_slots must be enabled to make sure that every standby node uses a slot on the primary node.