Pacemaker 无法在 postgres-11 上启动从节点
Pacemaker not able to start slave node on postgres-11
我有 2 个节点(分别命名为 node03 和 node04),主从、热备用设置使用 pacemaker 来管理集群。切换前,node04为主,03为备。
切换后,我一直想把node04重新拉回来做从节点,但是做不到。
在切换期间,我意识到有人更改了配置文件并将 ignore_system_indexes
参数设置为 true。我不得不删除它并手动重新启动 postgres 服务器。正是在这之后,集群开始表现异常。
可以手动将 node04 备份为从节点,即,如果我手动启动 postgres 实例并使用 recovery.conf 文件。
以下是了解情况所需的文件:
sudo crm_mon -A1f
Stack: corosync
Current DC: node03 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Node node04: standby
Online: [ node03 ]
Active resources:
Resource Group: master-group
vip-repli (ocf::heartbeat:IPaddr2): Started node03
vip-master (ocf::heartbeat:IPaddr2): Started node03
Master/Slave Set: pgsql-cluster [pgsqlins]
Masters: [ node03 ]
Node Attributes:
* Node node03:
+ master-pgsqlins : 1000
+ pgsqlins-data-status : LATEST
+ pgsqlins-master-baseline : 00008820DC000098
+ pgsqlins-status : PRI
* Node node04:
+ master-pgsqlins : -INFINITY
+ pgsqlins-data-status : DISCONNECT
+ pgsqlins-status : STOP
Migration Summary:
* Node node03:
* Node node04:
recovery.conf
primary_conninfo = 'host=1xx.xx.xx.xx port=5432 user=replica application_name=node04 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'rsync -a /Dxxxxx1/wal_archive/%f %p'
recovery_target_timeline = 'latest'
standby_mode = 'on'
集群cib
sudo pcs cluster cib
<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="269" num_updates="4" admin_epoch="0" cib-last-written="Mon Jun 28 15:13:35 2021" update-origin="node04" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.23-1.el7_9.1-9acf116022"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
<nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="pgcluster"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1624860815"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1" uname="node03">
<instance_attributes id="nodes-1">
<nvpair id="nodes-1-pgsqlins-data-status" name="pgsqlins-data-status" value="LATEST"/>
</instance_attributes>
</node>
<node id="2" uname="node04">
<instance_attributes id="nodes-2">
<nvpair id="nodes-2-pgsqlins-data-status" name="pgsqlins-data-status" value="DISCONNECT"/>
<nvpair id="nodes-2-standby" name="standby" value="on"/>
</instance_attributes>
</node>
</nodes>
<resources>
<group id="master-group">
<primitive class="ocf" id="vip-repli" provider="heartbeat" type="IPaddr2">
<instance_attributes id="vip-repli-instance_attributes">
<nvpair id="vip-repli-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
<nvpair id="vip-repli-instance_attributes-ip" name="ip" value="1xx.xx.xx.xx"/>
<nvpair id="vip-repli-instance_attributes-nic" name="nic" value="eth2"/>
</instance_attributes>
<operations>
<op id="vip-repli-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
<op id="vip-repli-start-interval-0s" interval="0s" name="start" timeout="20s"/>
<op id="vip-repli-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
</operations>
</primitive>
<primitive class="ocf" id="vip-master" provider="heartbeat" type="IPaddr2">
<instance_attributes id="vip-master-instance_attributes">
<nvpair id="vip-master-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
<nvpair id="vip-master-instance_attributes-ip" name="ip" value="1x.xx.xxx.xxx"/>
<nvpair id="vip-master-instance_attributes-nic" name="nic" value="eth1"/>
</instance_attributes>
<operations>
<op id="vip-master-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
<op id="vip-master-start-interval-0s" interval="0s" name="start" timeout="20s"/>
<op id="vip-master-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
</operations>
</primitive>
</group>
<master id="pgsql-cluster">
<primitive class="ocf" id="pgsqlins" provider="heartbeat" type="pgsql11">
<instance_attributes id="pgsqlins-instance_attributes">
<nvpair id="pgsqlins-instance_attributes-master_ip" name="master_ip" value="1xx.xx.xx.xx"/>
<nvpair id="pgsqlins-instance_attributes-node_list" name="node_list" value="node03 node04"/>
<nvpair id="pgsqlins-instance_attributes-pgctl" name="pgctl" value="/usr/pgsql-11/bin/pg_ctl"/>
<nvpair id="pgsqlins-instance_attributes-pgdata" name="pgdata" value="/DPxxxx01/datadg/data"/>
<nvpair id="pgsqlins-instance_attributes-pgport" name="pgport" value="5432"/>
<nvpair id="pgsqlins-instance_attributes-primary_conninfo_opt" name="primary_conninfo_opt" value="keepalives_idle=60 keepalives_interval=5 keepalives_count=5"/>
<nvpair id="pgsqlins-instance_attributes-psql" name="psql" value="/usr/pgsql-11/bin/psql"/>
<nvpair id="pgsqlins-instance_attributes-rep_mode" name="rep_mode" value="sync"/>
<nvpair id="pgsqlins-instance_attributes-repuser" name="repuser" value="replica"/>
<nvpair id="pgsqlins-instance_attributes-restart_on_promote" name="restart_on_promote" value="true"/>
<nvpair id="pgsqlins-instance_attributes-restore_command" name="restore_command" value="rsync -a /Dxxxxx01/wal_archive/%f %p"/>
</instance_attributes>
<operations>
<op id="pgsqlins-demote-interval-0" interval="0" name="demote" on-fail="stop" timeout="60s"/>
<op id="pgsqlins-methods-interval-0s" interval="0s" name="methods" timeout="5s"/>
<op id="pgsqlins-monitor-interval-10s" interval="10s" name="monitor" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-monitor-interval-9s" interval="9s" name="monitor" on-fail="restart" role="Master" timeout="60s"/>
<op id="pgsqlins-notify-interval-0" interval="0" name="notify" timeout="60s"/>
<op id="pgsqlins-promote-interval-0" interval="0" name="promote" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-start-interval-0" interval="0" name="start" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-stop-interval-0" interval="0" name="stop" on-fail="block" timeout="60s"/>
</operations>
</primitive>
<meta_attributes id="pgsql-cluster-meta_attributes">
<nvpair id="pgsql-cluster-meta_attributes-master-node-max" name="master-node-max" value="1"/>
<nvpair id="pgsql-cluster-meta_attributes-clone-max" name="clone-max" value="2"/>
<nvpair id="pgsql-cluster-meta_attributes-notify" name="notify" value="true"/>
<nvpair id="pgsql-cluster-meta_attributes-master-max" name="master-max" value="1"/>
<nvpair id="pgsql-cluster-meta_attributes-clone-node-max" name="clone-node-max" value="1"/>
</meta_attributes>
</master>
</resources>
<constraints>
<rsc_colocation id="colocation-master-group-pgsql-cluster-INFINITY" rsc="master-group" score="INFINITY" with-rsc="pgsql-cluster" with-rsc-role="Master"/>
<rsc_order first="pgsql-cluster" first-action="promote" id="order-pgsql-cluster-master-group-INFINITY" score="INFINITY" symmetrical="false" then="master-group" then-action="start"/>
<rsc_order first="pgsql-cluster" first-action="demote" id="order-pgsql-cluster-master-group-0" score="0" symmetrical="false" then="master-group" then-action="stop"/>
<rsc_location id="cli-prefer-pgsql-cluster" rsc="pgsql-cluster" role="Started" node="node04" score="INFINITY"/>
</constraints>
</configuration>
<status>
<node_state id="1" uname="node03" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<transient_attributes id="1">
<instance_attributes id="status-1">
<nvpair id="status-1-pgsqlins-status" name="pgsqlins-status" value="PRI"/>
<nvpair id="status-1-master-pgsqlins" name="master-pgsqlins" value="1000"/>
<nvpair id="status-1-pgsqlins-master-baseline" name="pgsqlins-master-baseline" value="00008820DC000098"/>
</instance_attributes>
</transient_attributes>
<lrm id="1">
<lrm_resources>
<lrm_resource id="vip-master" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-master_last_0" operation_key="vip-master_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="3:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;3:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="535" rc-code="0" op-status="0" interval="0" last-run="1624859077" last-rc-change="1624859077" exec-time="90" queue-time="0" op-digest="38fc1b2633211138e53cb349a5c147ff"/>
<lrm_rsc_op id="vip-master_monitor_10000" operation_key="vip-master_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;4:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="536" rc-code="0" op-status="0" interval="10000" last-rc-change="1624859077" exec-time="72" queue-time="0" op-digest="4cbf56ab9e52c6f07a7be8cbb786451c"/>
</lrm_resource>
<lrm_resource id="vip-repli" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-repli_last_0" operation_key="vip-repli_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="1:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;1:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="532" rc-code="0" op-status="0" interval="0" last-run="1624859077" last-rc-change="1624859077" exec-time="127" queue-time="0" op-digest="dd04ed3322c75b7bab13c5bea56dbe77"/>
<lrm_rsc_op id="vip-repli_monitor_10000" operation_key="vip-repli_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="2:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;2:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="534" rc-code="0" op-status="0" interval="10000" last-rc-change="1624859077" exec-time="55" queue-time="0" op-digest="c76770c29a91fb082fdf1fdd8b0469c3"/>
</lrm_resource>
<lrm_resource id="pgsqlins" type="pgsql11" class="ocf" provider="heartbeat">
<lrm_rsc_op id="pgsqlins_last_0" operation_key="pgsqlins_promote_0" operation="promote" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="12:432:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;12:432:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="530" rc-code="0" op-status="0" interval="0" last-run="1624859073" last-rc-change="1624859073" exec-time="3307" queue-time="0" op-digest="2f51441ed087061eb68745fd8157ddb6"/>
<lrm_rsc_op id="pgsqlins_monitor_9000" operation_key="pgsqlins_monitor_9000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="13:433:8:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:8;13:433:8:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="533" rc-code="8" op-status="0" interval="9000" last-rc-change="1624859078" exec-time="497" queue-time="1" op-digest="978aa48a7da35944c793e174dbee9a1d"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="2" uname="node04" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<lrm id="2">
<lrm_resources>
<lrm_resource id="vip-repli" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-repli_last_0" operation_key="vip-repli_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;4:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1624600624" last-rc-change="1624600624" exec-time="65" queue-time="0" op-digest="dd04ed3322c75b7bab13c5bea56dbe77"/>
</lrm_resource>
<lrm_resource id="vip-master" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-master_last_0" operation_key="vip-master_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="5:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;5:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1624600624" last-rc-change="1624600624" exec-time="62" queue-time="0" op-digest="38fc1b2633211138e53cb349a5c147ff"/>
</lrm_resource>
<lrm_resource id="pgsqlins" type="pgsql11" class="ocf" provider="heartbeat">
<lrm_rsc_op id="pgsqlins_last_0" operation_key="pgsqlins_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:436:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;4:436:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="192" rc-code="7" op-status="0" interval="0" last-run="1624860816" last-rc-change="1624860816" exec-time="178" queue-time="0" op-digest="2f51441ed087061eb68745fd8157ddb6"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="2">
<instance_attributes id="status-2">
<nvpair id="status-2-pgsqlins-status" name="pgsqlins-status" value="STOP"/>
<nvpair id="status-2-master-pgsqlins" name="master-pgsqlins" value="-INFINITY"/>
</instance_attributes>
</transient_attributes>
</node_state>
</status>
</cib>
如果我尝试取消待机 node04
,它会先降级 node03
,然后尝试启动 node04
,尽管 node04
没有出现。我试过只带 node04
一个人,但也失败了。
但是,如果我尝试从上述情况手动启动 node04
,我可以做到。如果我尝试清理 pgsqlins
资源,它会失败。
这里是corosync.log
8 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Forwarding cib_apply_diff operation for section 'all' to all (origin=local/ci
badmin/2)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.251.32 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.0 b956759712580c1bfdffd25cbf4ab8e9
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: -- /cib/configuration/nodes/node[@id='2']/instance_attributes[@id='nodes-2']/
nvpair[@id='nodes-2-standby']
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @epoch=252, @num_updates=0
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_apply_diff operation for section 'all': OK (rc=0, origin=dci2pg
s04/cibadmin/2, version=0.252.0)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_file_backup: Archived previous version as /var/lib/pacemaker/cib/cib-60.raw
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_file_write_with_digest: Wrote version 0.252.0 of the CIB to disk (digest: 8b99629d323c923de59
2700bc4398c49)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.ZtvQXP
(digest: /var/lib/pacemaker/cib/cib.fh4Toy)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.0 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.1 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=1
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@i
d='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @operation_key=pgsqlins_demote_0, @operation=demote, @transition-key=10:396:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transi
tion-magic=-1:193;10:396:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1624852894, @last-rc-change=1624852894, @exec-time=0
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03
/crmd/948, version=0.252.1)
Jun 28 13:01:34 [9294] node04.dc.japannext.co.jp attrd: info: attrd_peer_update: Setting master-pgsqlins[node03]: 1000 -> -INFINITY from node03
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.1 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.2 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_att
ributes[@id='status-1']/nvpair[@id='status-1-master-pgsqlins']: @value=-INFINITY
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03
/attrd/211, version=0.252.2)
Jun 28 13:01:34 [9294] node04.dc.japannext.co.jp attrd: info: attrd_peer_update: Setting pgsqlins-master-baseline[node03]: 00008820CC000098 -> (null) from node03
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.2 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.3 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: -- /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-pgsqlins-master-baseline']
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=3
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03/attrd/212, version=0.252.3)
Jun 28 13:01:35 [9294] node04.dc.japannext.co.jp attrd: info: attrd_peer_update: Setting pgsqlins-status[node03]: PRI -> STOP from node03
.
.
.
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @transition-magic=0:0;9:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=445, @rc-code=0, @op-status=0, @exec-time=471
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03/crmd/956, version=0.252.11)
Jun 28 13:01:36 [9296] node04.dc.japannext.co.jp crmd: info: do_lrm_rsc_op: Performing key=10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04 op=pgsqlins_start_0
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/crmd/142)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.11 2
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.12 (null)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=12
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @operation_key=pgsqlins_start_0, @operation=start, @transition-key=12:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transition-magic=-1:193;12:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @exec-time=0
Jun 28 13:01:36 [9293] node04.dc.japannext.co.jp lrmd: info: log_execute: executing - rsc:pgsqlins action:start call_id:132
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03/crmd/957, version=0.252.12)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.12 2
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.13 (null)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=13
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @operation_key=pgsqlins_start_0, @operation=start, @transition-key=10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transition-magic=-1:193;10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1624852896, @last-rc-change=1624852896, @exec-time=0
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node04/crmd/142, version=0.252.13)
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: Set all nodes into async mode.
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: PostgreSQL is down
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: server starting
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: PostgreSQL start command sent.
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: WARNING: Can't get PostgreSQL recovery status. rc=2
我的猜测是起搏器在从 /var/lib/pacemaker/cib
切换之前读取设置并使用它来执行这些步骤。任何有关如何重置它的帮助将不胜感激。
正如 pacemaker 问题中提到的,将 node04
置于非待机状态时,pacemaker 正在降级 node03
并试图让 node04
成为主服务器。它会在此任务中失败,然后将 node03
作为独立主服务器。
因为我怀疑它是从 cib
或 pengine
文件夹中选择一些旧配置,我什至破坏了两个节点上的集群,删除了 pacemaker、pcs 和 corosync并重新安装所有这些。
尽管如此,问题仍然存在。然后怀疑是不是node04
上的/var/lib/pgsql/
文件夹的文件夹权限可能不对,于是开始摸索。
这时候我才知道有一个旧的PGSQL.lock.bak
文件,日期是6月11日,也就是说它比PGSQL.lock
中的当前PGSQL.lock
文件旧11=],因此 pacemaker 试图提升 node04
但会失败。 Pacemaker 不会在任何日志中将此显示为错误。即使在 crm_mon
输出中也没有关于它的信息。一旦我删除了这个文件,它就像一个魅力。
TLDR;
- 检查
/var/lib/pgsql/tmp
文件夹中是否有任何 PGSQL.lock.bak
或任何其他不需要的文件,并在再次启动起搏器之前将其删除。
我有 2 个节点(分别命名为 node03 和 node04),主从、热备用设置使用 pacemaker 来管理集群。切换前,node04为主,03为备。 切换后,我一直想把node04重新拉回来做从节点,但是做不到。
在切换期间,我意识到有人更改了配置文件并将 ignore_system_indexes
参数设置为 true。我不得不删除它并手动重新启动 postgres 服务器。正是在这之后,集群开始表现异常。
可以手动将 node04 备份为从节点,即,如果我手动启动 postgres 实例并使用 recovery.conf 文件。
以下是了解情况所需的文件:
sudo crm_mon -A1f
Stack: corosync
Current DC: node03 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Node node04: standby
Online: [ node03 ]
Active resources:
Resource Group: master-group
vip-repli (ocf::heartbeat:IPaddr2): Started node03
vip-master (ocf::heartbeat:IPaddr2): Started node03
Master/Slave Set: pgsql-cluster [pgsqlins]
Masters: [ node03 ]
Node Attributes:
* Node node03:
+ master-pgsqlins : 1000
+ pgsqlins-data-status : LATEST
+ pgsqlins-master-baseline : 00008820DC000098
+ pgsqlins-status : PRI
* Node node04:
+ master-pgsqlins : -INFINITY
+ pgsqlins-data-status : DISCONNECT
+ pgsqlins-status : STOP
Migration Summary:
* Node node03:
* Node node04:
recovery.conf
primary_conninfo = 'host=1xx.xx.xx.xx port=5432 user=replica application_name=node04 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'rsync -a /Dxxxxx1/wal_archive/%f %p'
recovery_target_timeline = 'latest'
standby_mode = 'on'
集群cib
sudo pcs cluster cib
<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="269" num_updates="4" admin_epoch="0" cib-last-written="Mon Jun 28 15:13:35 2021" update-origin="node04" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.23-1.el7_9.1-9acf116022"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
<nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="pgcluster"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1624860815"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1" uname="node03">
<instance_attributes id="nodes-1">
<nvpair id="nodes-1-pgsqlins-data-status" name="pgsqlins-data-status" value="LATEST"/>
</instance_attributes>
</node>
<node id="2" uname="node04">
<instance_attributes id="nodes-2">
<nvpair id="nodes-2-pgsqlins-data-status" name="pgsqlins-data-status" value="DISCONNECT"/>
<nvpair id="nodes-2-standby" name="standby" value="on"/>
</instance_attributes>
</node>
</nodes>
<resources>
<group id="master-group">
<primitive class="ocf" id="vip-repli" provider="heartbeat" type="IPaddr2">
<instance_attributes id="vip-repli-instance_attributes">
<nvpair id="vip-repli-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
<nvpair id="vip-repli-instance_attributes-ip" name="ip" value="1xx.xx.xx.xx"/>
<nvpair id="vip-repli-instance_attributes-nic" name="nic" value="eth2"/>
</instance_attributes>
<operations>
<op id="vip-repli-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
<op id="vip-repli-start-interval-0s" interval="0s" name="start" timeout="20s"/>
<op id="vip-repli-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
</operations>
</primitive>
<primitive class="ocf" id="vip-master" provider="heartbeat" type="IPaddr2">
<instance_attributes id="vip-master-instance_attributes">
<nvpair id="vip-master-instance_attributes-cidr_netmask" name="cidr_netmask" value="24"/>
<nvpair id="vip-master-instance_attributes-ip" name="ip" value="1x.xx.xxx.xxx"/>
<nvpair id="vip-master-instance_attributes-nic" name="nic" value="eth1"/>
</instance_attributes>
<operations>
<op id="vip-master-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/>
<op id="vip-master-start-interval-0s" interval="0s" name="start" timeout="20s"/>
<op id="vip-master-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
</operations>
</primitive>
</group>
<master id="pgsql-cluster">
<primitive class="ocf" id="pgsqlins" provider="heartbeat" type="pgsql11">
<instance_attributes id="pgsqlins-instance_attributes">
<nvpair id="pgsqlins-instance_attributes-master_ip" name="master_ip" value="1xx.xx.xx.xx"/>
<nvpair id="pgsqlins-instance_attributes-node_list" name="node_list" value="node03 node04"/>
<nvpair id="pgsqlins-instance_attributes-pgctl" name="pgctl" value="/usr/pgsql-11/bin/pg_ctl"/>
<nvpair id="pgsqlins-instance_attributes-pgdata" name="pgdata" value="/DPxxxx01/datadg/data"/>
<nvpair id="pgsqlins-instance_attributes-pgport" name="pgport" value="5432"/>
<nvpair id="pgsqlins-instance_attributes-primary_conninfo_opt" name="primary_conninfo_opt" value="keepalives_idle=60 keepalives_interval=5 keepalives_count=5"/>
<nvpair id="pgsqlins-instance_attributes-psql" name="psql" value="/usr/pgsql-11/bin/psql"/>
<nvpair id="pgsqlins-instance_attributes-rep_mode" name="rep_mode" value="sync"/>
<nvpair id="pgsqlins-instance_attributes-repuser" name="repuser" value="replica"/>
<nvpair id="pgsqlins-instance_attributes-restart_on_promote" name="restart_on_promote" value="true"/>
<nvpair id="pgsqlins-instance_attributes-restore_command" name="restore_command" value="rsync -a /Dxxxxx01/wal_archive/%f %p"/>
</instance_attributes>
<operations>
<op id="pgsqlins-demote-interval-0" interval="0" name="demote" on-fail="stop" timeout="60s"/>
<op id="pgsqlins-methods-interval-0s" interval="0s" name="methods" timeout="5s"/>
<op id="pgsqlins-monitor-interval-10s" interval="10s" name="monitor" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-monitor-interval-9s" interval="9s" name="monitor" on-fail="restart" role="Master" timeout="60s"/>
<op id="pgsqlins-notify-interval-0" interval="0" name="notify" timeout="60s"/>
<op id="pgsqlins-promote-interval-0" interval="0" name="promote" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-start-interval-0" interval="0" name="start" on-fail="restart" timeout="60s"/>
<op id="pgsqlins-stop-interval-0" interval="0" name="stop" on-fail="block" timeout="60s"/>
</operations>
</primitive>
<meta_attributes id="pgsql-cluster-meta_attributes">
<nvpair id="pgsql-cluster-meta_attributes-master-node-max" name="master-node-max" value="1"/>
<nvpair id="pgsql-cluster-meta_attributes-clone-max" name="clone-max" value="2"/>
<nvpair id="pgsql-cluster-meta_attributes-notify" name="notify" value="true"/>
<nvpair id="pgsql-cluster-meta_attributes-master-max" name="master-max" value="1"/>
<nvpair id="pgsql-cluster-meta_attributes-clone-node-max" name="clone-node-max" value="1"/>
</meta_attributes>
</master>
</resources>
<constraints>
<rsc_colocation id="colocation-master-group-pgsql-cluster-INFINITY" rsc="master-group" score="INFINITY" with-rsc="pgsql-cluster" with-rsc-role="Master"/>
<rsc_order first="pgsql-cluster" first-action="promote" id="order-pgsql-cluster-master-group-INFINITY" score="INFINITY" symmetrical="false" then="master-group" then-action="start"/>
<rsc_order first="pgsql-cluster" first-action="demote" id="order-pgsql-cluster-master-group-0" score="0" symmetrical="false" then="master-group" then-action="stop"/>
<rsc_location id="cli-prefer-pgsql-cluster" rsc="pgsql-cluster" role="Started" node="node04" score="INFINITY"/>
</constraints>
</configuration>
<status>
<node_state id="1" uname="node03" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<transient_attributes id="1">
<instance_attributes id="status-1">
<nvpair id="status-1-pgsqlins-status" name="pgsqlins-status" value="PRI"/>
<nvpair id="status-1-master-pgsqlins" name="master-pgsqlins" value="1000"/>
<nvpair id="status-1-pgsqlins-master-baseline" name="pgsqlins-master-baseline" value="00008820DC000098"/>
</instance_attributes>
</transient_attributes>
<lrm id="1">
<lrm_resources>
<lrm_resource id="vip-master" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-master_last_0" operation_key="vip-master_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="3:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;3:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="535" rc-code="0" op-status="0" interval="0" last-run="1624859077" last-rc-change="1624859077" exec-time="90" queue-time="0" op-digest="38fc1b2633211138e53cb349a5c147ff"/>
<lrm_rsc_op id="vip-master_monitor_10000" operation_key="vip-master_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;4:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="536" rc-code="0" op-status="0" interval="10000" last-rc-change="1624859077" exec-time="72" queue-time="0" op-digest="4cbf56ab9e52c6f07a7be8cbb786451c"/>
</lrm_resource>
<lrm_resource id="vip-repli" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-repli_last_0" operation_key="vip-repli_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="1:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;1:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="532" rc-code="0" op-status="0" interval="0" last-run="1624859077" last-rc-change="1624859077" exec-time="127" queue-time="0" op-digest="dd04ed3322c75b7bab13c5bea56dbe77"/>
<lrm_rsc_op id="vip-repli_monitor_10000" operation_key="vip-repli_monitor_10000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="2:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;2:433:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="534" rc-code="0" op-status="0" interval="10000" last-rc-change="1624859077" exec-time="55" queue-time="0" op-digest="c76770c29a91fb082fdf1fdd8b0469c3"/>
</lrm_resource>
<lrm_resource id="pgsqlins" type="pgsql11" class="ocf" provider="heartbeat">
<lrm_rsc_op id="pgsqlins_last_0" operation_key="pgsqlins_promote_0" operation="promote" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="12:432:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:0;12:432:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="530" rc-code="0" op-status="0" interval="0" last-run="1624859073" last-rc-change="1624859073" exec-time="3307" queue-time="0" op-digest="2f51441ed087061eb68745fd8157ddb6"/>
<lrm_rsc_op id="pgsqlins_monitor_9000" operation_key="pgsqlins_monitor_9000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="13:433:8:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:8;13:433:8:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node03" call-id="533" rc-code="8" op-status="0" interval="9000" last-rc-change="1624859078" exec-time="497" queue-time="1" op-digest="978aa48a7da35944c793e174dbee9a1d"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="2" uname="node04" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<lrm id="2">
<lrm_resources>
<lrm_resource id="vip-repli" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-repli_last_0" operation_key="vip-repli_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;4:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1624600624" last-rc-change="1624600624" exec-time="65" queue-time="0" op-digest="dd04ed3322c75b7bab13c5bea56dbe77"/>
</lrm_resource>
<lrm_resource id="vip-master" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="vip-master_last_0" operation_key="vip-master_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="5:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;5:1:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="9" rc-code="7" op-status="0" interval="0" last-run="1624600624" last-rc-change="1624600624" exec-time="62" queue-time="0" op-digest="38fc1b2633211138e53cb349a5c147ff"/>
</lrm_resource>
<lrm_resource id="pgsqlins" type="pgsql11" class="ocf" provider="heartbeat">
<lrm_rsc_op id="pgsqlins_last_0" operation_key="pgsqlins_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.14" transition-key="4:436:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" transition-magic="0:7;4:436:7:54755ae3-42a4-477c-ae37-8ae8bfbc1f04" exit-reason="" on_node="node04" call-id="192" rc-code="7" op-status="0" interval="0" last-run="1624860816" last-rc-change="1624860816" exec-time="178" queue-time="0" op-digest="2f51441ed087061eb68745fd8157ddb6"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="2">
<instance_attributes id="status-2">
<nvpair id="status-2-pgsqlins-status" name="pgsqlins-status" value="STOP"/>
<nvpair id="status-2-master-pgsqlins" name="master-pgsqlins" value="-INFINITY"/>
</instance_attributes>
</transient_attributes>
</node_state>
</status>
</cib>
如果我尝试取消待机 node04
,它会先降级 node03
,然后尝试启动 node04
,尽管 node04
没有出现。我试过只带 node04
一个人,但也失败了。
但是,如果我尝试从上述情况手动启动 node04
,我可以做到。如果我尝试清理 pgsqlins
资源,它会失败。
这里是corosync.log
8 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Forwarding cib_apply_diff operation for section 'all' to all (origin=local/ci
badmin/2)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.251.32 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.0 b956759712580c1bfdffd25cbf4ab8e9
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: -- /cib/configuration/nodes/node[@id='2']/instance_attributes[@id='nodes-2']/
nvpair[@id='nodes-2-standby']
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @epoch=252, @num_updates=0
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_apply_diff operation for section 'all': OK (rc=0, origin=dci2pg
s04/cibadmin/2, version=0.252.0)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_file_backup: Archived previous version as /var/lib/pacemaker/cib/cib-60.raw
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_file_write_with_digest: Wrote version 0.252.0 of the CIB to disk (digest: 8b99629d323c923de59
2700bc4398c49)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.ZtvQXP
(digest: /var/lib/pacemaker/cib/cib.fh4Toy)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.0 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.1 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=1
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@i
d='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @operation_key=pgsqlins_demote_0, @operation=demote, @transition-key=10:396:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transi
tion-magic=-1:193;10:396:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1624852894, @last-rc-change=1624852894, @exec-time=0
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03
/crmd/948, version=0.252.1)
Jun 28 13:01:34 [9294] node04.dc.japannext.co.jp attrd: info: attrd_peer_update: Setting master-pgsqlins[node03]: 1000 -> -INFINITY from node03
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.1 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.2 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_att
ributes[@id='status-1']/nvpair[@id='status-1-master-pgsqlins']: @value=-INFINITY
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03
/attrd/211, version=0.252.2)
Jun 28 13:01:34 [9294] node04.dc.japannext.co.jp attrd: info: attrd_peer_update: Setting pgsqlins-master-baseline[node03]: 00008820CC000098 -> (null) from node03
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.2 2
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.3 (null)
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: -- /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-pgsqlins-master-baseline']
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=3
Jun 28 13:01:34 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03/attrd/212, version=0.252.3)
Jun 28 13:01:35 [9294] node04.dc.japannext.co.jp attrd: info: attrd_peer_update: Setting pgsqlins-status[node03]: PRI -> STOP from node03
.
.
.
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @transition-magic=0:0;9:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=445, @rc-code=0, @op-status=0, @exec-time=471
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03/crmd/956, version=0.252.11)
Jun 28 13:01:36 [9296] node04.dc.japannext.co.jp crmd: info: do_lrm_rsc_op: Performing key=10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04 op=pgsqlins_start_0
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/crmd/142)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.11 2
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.12 (null)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=12
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @operation_key=pgsqlins_start_0, @operation=start, @transition-key=12:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transition-magic=-1:193;12:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @exec-time=0
Jun 28 13:01:36 [9293] node04.dc.japannext.co.jp lrmd: info: log_execute: executing - rsc:pgsqlins action:start call_id:132
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node03/crmd/957, version=0.252.12)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: --- 0.252.12 2
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: Diff: +++ 0.252.13 (null)
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib: @num_updates=13
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_perform_op: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='pgsqlins']/lrm_rsc_op[@id='pgsqlins_last_0']: @operation_key=pgsqlins_start_0, @operation=start, @transition-key=10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @transition-magic=-1:193;10:397:0:54755ae3-42a4-477c-ae37-8ae8bfbc1f04, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1624852896, @last-rc-change=1624852896, @exec-time=0
Jun 28 13:01:36 [9291] node04.dc.japannext.co.jp cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node04/crmd/142, version=0.252.13)
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: Set all nodes into async mode.
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: PostgreSQL is down
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: server starting
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: INFO: PostgreSQL start command sent.
Jun 28 13:01:37 pgsql11(pgsqlins)[9613]: WARNING: Can't get PostgreSQL recovery status. rc=2
我的猜测是起搏器在从 /var/lib/pacemaker/cib
切换之前读取设置并使用它来执行这些步骤。任何有关如何重置它的帮助将不胜感激。
正如 pacemaker 问题中提到的,将
node04
置于非待机状态时,pacemaker 正在降级node03
并试图让node04
成为主服务器。它会在此任务中失败,然后将node03
作为独立主服务器。因为我怀疑它是从
cib
或pengine
文件夹中选择一些旧配置,我什至破坏了两个节点上的集群,删除了 pacemaker、pcs 和 corosync并重新安装所有这些。尽管如此,问题仍然存在。然后怀疑是不是
node04
上的/var/lib/pgsql/
文件夹的文件夹权限可能不对,于是开始摸索。这时候我才知道有一个旧的
PGSQL.lock.bak
文件,日期是6月11日,也就是说它比PGSQL.lock
中的当前PGSQL.lock
文件旧11=],因此 pacemaker 试图提升node04
但会失败。 Pacemaker 不会在任何日志中将此显示为错误。即使在crm_mon
输出中也没有关于它的信息。一旦我删除了这个文件,它就像一个魅力。
TLDR;
- 检查
/var/lib/pgsql/tmp
文件夹中是否有任何PGSQL.lock.bak
或任何其他不需要的文件,并在再次启动起搏器之前将其删除。