无法在 3 节点 patroni postgres-11 集群中进行故障转移

Question

我有一个三节点patronipostgres-11集群running.The守护神集群设置为：

consul:
  url: <consul URL>

bootstrap:
  initdb:
    encoding: UTF8
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      remove_data_directory_on_diverged_timelines: false
      remove_data_directory_on_rewind_failure: true
      parameters:
        archive_command: /bin/true
        archive_mode: 'on'
        archive_timeout: 1800s
        checkpoint_completion_target: 0.9
        default_statistics_target: 100
        effective_cache_size: 24GB
        effective_io_concurrency: 200
        hot_standby: 'on'
        log_directory: /var/log/postgresql
        log_filename: postgresql-%a.log
        logging_collector: 'on'
        maintenance_work_mem: 2GB
        max_connections: 2000
        max_parallel_maintenance_workers: 4
        max_parallel_workers: 8
        max_parallel_workers_per_gather: 4
        max_replication_slots: 10
        max_wal_senders: 10
        max_wal_size: 8GB
        max_worker_processes: 8
        min_wal_size: 2GB
        random_page_cost: 1.1
        shared_buffers: 8GB
        ssl: 'on'
        ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem
        ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key
        track_commit_timestamp: 'on'
        wal_buffers: 16MB
        wal_keep_segments: 64
        wal_level: logical
        wal_log_hints: 'on'
        work_mem: 5242kB

postgresql:
  listen: 0.0.0.0:5432
  connect_address: <IP Address of the node>:5432
  data_dir: /var/lib/postgresql/11/<db name>
  pgpass: /tmp/pgpass0
  authentication:
    superuser:
      username: postgres
      password: postgres
    replication:
      username: replicator
      password: replicator
    rewind:
      username: rewind_user
      password: rewind_password
  pg_hba:
    - local all all peer
    - host all all 127.0.0.1/32 md5
    - hostssl all postgres 0.0.0.0/0 md5
    - host all postgres 0.0.0.0/0 reject
    - host all all 0.0.0.0/0 md5
    - hostssl all postgres ::0/0 md5
    - host all postgres ::0/0 reject
    - host all all ::0/0 md5
    - local replication all peer
    - host replication all 127.0.0.1/32 md5
    - hostssl replication replicator 0.0.0.0/0 md5
    - hostssl replication replicator ::0/0 md5
    - host replication replicator 0.0.0.0/0 md5
    - host replication replicator 127.0.0.1/32 trust
    - local postgres   postgres   peer
    - host all all 0.0.0.0/0 md5
  create_replica_methods:
    - basebackup
  basebackup:
    max-rate: '250M'

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

restapi:
  connect_address: <IP Address of the node>:8008
  listen: 0.0.0.0:8008

当我关闭主节点时，我希望 synchronous standby 成为主节点。但是在两个备用节点上我都看到了以下错误：

INFO: following a different leader because i am not the healthiest node

不确定为什么同步备用没有成为主。我在 Ubuntu 18.04

上使用 Patroni 1.6.1

Answer 1

我找到问题了。由于某种原因，我的节点名称中的大写字母被转换为小写字母，因此 patrani 无法找到备用节点。如下所示：

{"leader":"Pa-db1","sync_standby":"pa-db2"}

正确的节点名称是Pa-db2，同理第二备节点名称是Pa-db3。我将所有节点名称更改为小写字母，即 pa-db1, pa-db2, pa-db3 并且故障转移成功了！

无法在 3 节点 patroni postgres-11 集群中进行故障转移

Failed to do failover in 3 node patroni postgres-11 cluster

postgresql-11

patroni