加入 Postgres BDR 节点使用过时的 DSN
Joining Postgres BDR Node Uses Outdated DSN
我有一个 Postgres BDR 集群,它有 3 个节点“Ready" and 3 nodes "Parted”。
如果我这样做 SELECT * FROM bdr.bdr_nodes
会显示以下信息:
-[ RECORD 1 ]------+-------------------------
node_sysid | 6153716379158074503
node_timeline | 1
node_dboid | 16385
node_status | r
node_name | node3
node_local_dsn | host=x.x.x.241 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 2 ]------+-------------------------
node_sysid | 6153716914784688297
node_timeline | 1
node_dboid | 16385
node_status | r
node_name | node2
node_local_dsn | host=x.x.x.5 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 3 ]------+-------------------------
node_sysid | 6170758438846557459
node_timeline | 1
node_dboid | 16384
node_status | r
node_name | node4
node_local_dsn | host=x.x.x.128 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 4 ]------+-------------------------
node_sysid | 6153716402564903569
node_timeline | 1
node_dboid | 16385
node_status | k
node_name | node1
node_local_dsn | host=x.x.x.47 [...]
node_init_from_dsn |
-[ RECORD 5 ]------+-------------------------
node_sysid | 6170830020100809103
node_timeline | 1
node_dboid | 16385
node_status | k
node_name | node6
node_local_dsn | host=x.x.x.48 [...]
node_init_from_dsn | host=x.x.x.241 [...]
-[ RECORD 6 ]------+-------------------------
node_sysid | 6170839982079996801
node_timeline | 1
node_dboid | 16385
node_status | c
node_name | node8
node_local_dsn | host=x.x.x.142 [...]
node_init_from_dsn | host=x.x.x.241 [...]
-[ RECORD 7 ]------+-------------------------
node_sysid | 6170833985333433816
node_timeline | 1
node_dboid | 16385
node_status | k
node_name | node7
node_local_dsn | host=x.x.x.48 [...]
node_init_from_dsn | host=x.x.x.241 [...]
我正在尝试加入 node8
。但这不会发生。错误如下:
d= p=5521 a=ERROR: 08006: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "x.x.x.48" and accepting
TCP/IP connections on port 5432?
d= p=5521 a=DETAIL: Connection string is 'host=x.x.x.48 [...]'
该错误意味着它正在尝试连接到已被杀死或删除的节点。为什么要尝试连接到被杀死或删除的节点?我该如何解决这种情况?
以下命令用于加入node8
SELECT bdr.bdr_group_join(
local_node_name := 'node8',
node_external_dsn := 'host=x.x.x.142 [...]',
join_using_dsn := 'host=x.x.x.241 [...]'
);
BDR 已根据 this instructions (Debian Wheezy) 安装:
curl -sSL https://manageacloud.com/api/cm/configuration/postgresql-bdr/debian/manageacloud-production-script.sh | bash
Table bdr.bdr_connections
:
-[ RECORD 1 ]----------+---------------------
conn_sysid | 6170839982079996801
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.142 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 2 ]----------+----------------------
conn_sysid | 6153716402564903569
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.47 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 3 ]----------+-----------------------
conn_sysid | 6153716379158074503
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.241 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 4 ]----------+-----------------------
conn_sysid | 6153716914784688297
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.5 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 5 ]----------+-----------------------
conn_sysid | 6170758438846557459
conn_timeline | 1
conn_dboid | 16384
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.128 [...]
conn_apply_delay |
conn_replication_sets | {default}
版本:
# SELECT bdr.bdr_version();
bdr_version
-------------------
0.9.1-2015-05-26-
(1 row)
这是 BDR 中的错误。我刚刚在 bdr-plugin/next
树的本地副本中修复了它,一旦我在本地对其进行了测试,就会将更改推送到 bdr-plugin/REL0_9_STABLE
以包含在 0.9.3 中。
问题是在作为节点加入的一部分在对等节点上创建插槽期间,我们没有根据 bdr.bdr_nodes.state
过滤掉 bdr.bdr_connections
行。
删除没有相应 bdr.bdr_nodes
条目的任何 bdr.bdr_connections
条目是安全的,或者 bdr.bdr_nodes
条目具有 state = 'k'
的条目可以在 0.9.2 中解决此问题及以上。
我有一个 Postgres BDR 集群,它有 3 个节点“Ready" and 3 nodes "Parted”。
如果我这样做 SELECT * FROM bdr.bdr_nodes
会显示以下信息:
-[ RECORD 1 ]------+-------------------------
node_sysid | 6153716379158074503
node_timeline | 1
node_dboid | 16385
node_status | r
node_name | node3
node_local_dsn | host=x.x.x.241 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 2 ]------+-------------------------
node_sysid | 6153716914784688297
node_timeline | 1
node_dboid | 16385
node_status | r
node_name | node2
node_local_dsn | host=x.x.x.5 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 3 ]------+-------------------------
node_sysid | 6170758438846557459
node_timeline | 1
node_dboid | 16384
node_status | r
node_name | node4
node_local_dsn | host=x.x.x.128 [...]
node_init_from_dsn | host=x.x.x.47 [...]
-[ RECORD 4 ]------+-------------------------
node_sysid | 6153716402564903569
node_timeline | 1
node_dboid | 16385
node_status | k
node_name | node1
node_local_dsn | host=x.x.x.47 [...]
node_init_from_dsn |
-[ RECORD 5 ]------+-------------------------
node_sysid | 6170830020100809103
node_timeline | 1
node_dboid | 16385
node_status | k
node_name | node6
node_local_dsn | host=x.x.x.48 [...]
node_init_from_dsn | host=x.x.x.241 [...]
-[ RECORD 6 ]------+-------------------------
node_sysid | 6170839982079996801
node_timeline | 1
node_dboid | 16385
node_status | c
node_name | node8
node_local_dsn | host=x.x.x.142 [...]
node_init_from_dsn | host=x.x.x.241 [...]
-[ RECORD 7 ]------+-------------------------
node_sysid | 6170833985333433816
node_timeline | 1
node_dboid | 16385
node_status | k
node_name | node7
node_local_dsn | host=x.x.x.48 [...]
node_init_from_dsn | host=x.x.x.241 [...]
我正在尝试加入 node8
。但这不会发生。错误如下:
d= p=5521 a=ERROR: 08006: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "x.x.x.48" and accepting
TCP/IP connections on port 5432?
d= p=5521 a=DETAIL: Connection string is 'host=x.x.x.48 [...]'
该错误意味着它正在尝试连接到已被杀死或删除的节点。为什么要尝试连接到被杀死或删除的节点?我该如何解决这种情况?
以下命令用于加入node8
SELECT bdr.bdr_group_join(
local_node_name := 'node8',
node_external_dsn := 'host=x.x.x.142 [...]',
join_using_dsn := 'host=x.x.x.241 [...]'
);
BDR 已根据 this instructions (Debian Wheezy) 安装:
curl -sSL https://manageacloud.com/api/cm/configuration/postgresql-bdr/debian/manageacloud-production-script.sh | bash
Table bdr.bdr_connections
:
-[ RECORD 1 ]----------+---------------------
conn_sysid | 6170839982079996801
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.142 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 2 ]----------+----------------------
conn_sysid | 6153716402564903569
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.47 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 3 ]----------+-----------------------
conn_sysid | 6153716379158074503
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.241 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 4 ]----------+-----------------------
conn_sysid | 6153716914784688297
conn_timeline | 1
conn_dboid | 16385
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.5 [...]
conn_apply_delay |
conn_replication_sets | {default}
-[ RECORD 5 ]----------+-----------------------
conn_sysid | 6170758438846557459
conn_timeline | 1
conn_dboid | 16384
conn_origin_sysid | 0
conn_origin_timeline | 0
conn_origin_dboid | 0
conn_is_unidirectional | f
conn_dsn | host=x.x.x.128 [...]
conn_apply_delay |
conn_replication_sets | {default}
版本:
# SELECT bdr.bdr_version();
bdr_version
-------------------
0.9.1-2015-05-26-
(1 row)
这是 BDR 中的错误。我刚刚在 bdr-plugin/next
树的本地副本中修复了它,一旦我在本地对其进行了测试,就会将更改推送到 bdr-plugin/REL0_9_STABLE
以包含在 0.9.3 中。
问题是在作为节点加入的一部分在对等节点上创建插槽期间,我们没有根据 bdr.bdr_nodes.state
过滤掉 bdr.bdr_connections
行。
删除没有相应 bdr.bdr_nodes
条目的任何 bdr.bdr_connections
条目是安全的,或者 bdr.bdr_nodes
条目具有 state = 'k'
的条目可以在 0.9.2 中解决此问题及以上。