PostgreSQL 主服务器挂起复制流程
PostgreSQL master server hangs on replication flow
首先,我不是数据工程师,所以我会尽力为您提供解决我的问题所需的一切:/
上下文:
我正在尝试创建 2 个 PostgreSQL 服务器,1 个主服务器和 1 个从服务器。
psql (PostgreSQL) 10.9 (Ubuntu 10.9-0ubuntu0.18.04.1)
据我了解,当我们只有 2 个服务器时进行同步复制不是一个好主意。但我必须明白这里发生了什么......
问题:
当我尝试执行 CREATE SCHEMA test;
时,主服务器挂起。
但是,Schema 存在于 Master 上,也存在于 Slave 上。 Master挂起是因为它在等待slave提交状态...
Master配置:
/etc/postgresql/10/main/conf.d/master.conf
# Connection
listen_addresses = '127.0.0.1,slave-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
synchronous_commit = remote_apply #local works, remote_apply hangs
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres@lab-3:/var/lib/postgresql/wal_archive_lab_2/%f'
# Replication master
max_wal_senders = 2
wal_keep_segments = 100
synchronous_standby_names = 'ANY 1 ("lab-3")'
/etc/postgresql/10/main/pg_hba.conf
hostssl replication replicate slave-ip/32 scram-sha-256
从机配置:
/etc/postgresql/10/main/conf.d/standby.conf
# Connection
listen_addresses = '127.0.0.1,master-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres@lab-3:/var/lib/postgresql/wal_archive_lab_3/%f'
# Replication slave
max_wal_senders = 2
wal_keep_segments = 100
hot_standby = on
/var/lib/postgresql/10/main/recovery.conf
standby_mode = on
primary_conninfo = 'host=master-ip port=5432 user=replicate password=replicate_password sslmode=require application_name="lab-3"'
trigger_file = '/var/lib/postgresql/10/postgresql.trigger'
当它挂起时,我在日志文件中完全没有得到任何东西,只是当我 Ctrl+C 中止主实例时的错误:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
有没有办法检查追加的内容,以及为什么它会像这样卡住?
编辑 1
pg_stat_replication
的内容:
查询前
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/110000D0 | 0/110000D0 | 0/110000D0 | 0/110000D0 | | | | 0 | async
(1 行)
当它挂起时/之后
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------------+-----------------+---------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/11000C10 | 0/11000C10 | 0/11000C10 | 0/11000C10 | 00:00:00.000521 | 00:00:00.004421 | 00:00:00.0045 | 0 | async
(1 行)
谢谢!
正如 Laurenz Albe 所说,问题出在同步备用名称的引用上。
文档解释说,如果它包含破折号,它应该在主服务器上的 synchronous_standby_names
配置条目中被引用,但它不能在从属服务器上的 primary_conninfo
值中被引用。
首先,我不是数据工程师,所以我会尽力为您提供解决我的问题所需的一切:/
上下文: 我正在尝试创建 2 个 PostgreSQL 服务器,1 个主服务器和 1 个从服务器。
psql (PostgreSQL) 10.9 (Ubuntu 10.9-0ubuntu0.18.04.1)
据我了解,当我们只有 2 个服务器时进行同步复制不是一个好主意。但我必须明白这里发生了什么......
问题:
当我尝试执行 CREATE SCHEMA test;
时,主服务器挂起。
但是,Schema 存在于 Master 上,也存在于 Slave 上。 Master挂起是因为它在等待slave提交状态...
Master配置: /etc/postgresql/10/main/conf.d/master.conf
# Connection
listen_addresses = '127.0.0.1,slave-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
synchronous_commit = remote_apply #local works, remote_apply hangs
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres@lab-3:/var/lib/postgresql/wal_archive_lab_2/%f'
# Replication master
max_wal_senders = 2
wal_keep_segments = 100
synchronous_standby_names = 'ANY 1 ("lab-3")'
/etc/postgresql/10/main/pg_hba.conf
hostssl replication replicate slave-ip/32 scram-sha-256
从机配置: /etc/postgresql/10/main/conf.d/standby.conf
# Connection
listen_addresses = '127.0.0.1,master-ip'
ssl = on
ssl_cert_file = '/etc/ssl/postgresql/certs/server.pem'
ssl_key_file = '/etc/ssl/postgresql/private/server.key'
ssl_ca_file = '/etc/ssl/postgresql/certs/server.pem'
password_encryption = scram-sha-256
# WAL
wal_level = replica
# Archive
archive_mode = on
archive_command = 'rsync -av %p postgres@lab-3:/var/lib/postgresql/wal_archive_lab_3/%f'
# Replication slave
max_wal_senders = 2
wal_keep_segments = 100
hot_standby = on
/var/lib/postgresql/10/main/recovery.conf
standby_mode = on
primary_conninfo = 'host=master-ip port=5432 user=replicate password=replicate_password sslmode=require application_name="lab-3"'
trigger_file = '/var/lib/postgresql/10/postgresql.trigger'
当它挂起时,我在日志文件中完全没有得到任何东西,只是当我 Ctrl+C 中止主实例时的错误:
WARNING: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
有没有办法检查追加的内容,以及为什么它会像这样卡住?
编辑 1
pg_stat_replication
的内容:
查询前
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------+-----------+------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/110000D0 | 0/110000D0 | 0/110000D0 | 0/110000D0 | | | | 0 | async
(1 行)
当它挂起时/之后
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+-----------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+------------+------------+------------+------------+-----------------+-----------------+---------------+---------------+------------
54431 | 16384 | replicate | "lab-3" | slave-ip | | 47742 | 2019-08-06 07:56:48.105056+02 | | streaming | 0/11000C10 | 0/11000C10 | 0/11000C10 | 0/11000C10 | 00:00:00.000521 | 00:00:00.004421 | 00:00:00.0045 | 0 | async
(1 行)
谢谢!
正如 Laurenz Albe 所说,问题出在同步备用名称的引用上。
文档解释说,如果它包含破折号,它应该在主服务器上的 synchronous_standby_names
配置条目中被引用,但它不能在从属服务器上的 primary_conninfo
值中被引用。