crm集群切换后无法挂载drbd
Crm cluster cannot mount drbd after switching
这是我的 freepbx active/passive 集群。它适用于 proxmox 管理程序。
关闭master后,第二个节点上的res_filesystem_1不启动。 drbd 不挂载。使用 drbd 的服务没有启动。
我有很多错误然后我显示状态
/usr/sbin/crm_mon -1 -r -f
Last updated: Sun Feb 4 19:50:21 2018
Last change: Sun Feb 4 19:45:35 2018
Stack: cman
Current DC: fpbx2.hrm1.group.ru - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured
8 Resources configured
Node fpbx1.hrm1.group.ru: OFFLINE (standby)
Online: [ fpbx2.hrm1.group.ru ]
Full list of resources:
Master/Slave Set: ms_drbd_1 [res_drbd_1]
Masters: [ fpbx2.hrm1.group.ru ]
Stopped: [ fpbx1.hrm1.group.ru ]
res_Filesystem_1 (ocf::heartbeat:Filesystem): Stopped
res_IPaddr2_1 (ocf::heartbeat:IPaddr2): Started fpbx2.hrm1.group.ru
res_asterisk_asterisk (service:asterisk): FAILED fpbx2.hrm1.group.ru
res_apache_apache (ocf::heartbeat:apache): Stopped
res_mysql_mysql_fpbx (ocf::heartbeat:mysql): Stopped
res_IPsrcaddr_src_addr (ocf::heartbeat:IPsrcaddr): Started fpbx2.hrm1.group.ru
Migration summary:
* Node fpbx2.hrm1.group.ru:
res_apache_apache: migration-threshold=1000000 fail-count=1000000 last-failure='Sun Feb 4 19:47:57 2018'
res_Filesystem_1: migration-threshold=1000000 fail-count=1000000 last-failure='Sun Feb 4 19:47:57 2018'
res_mysql_mysql_fpbx: migration-threshold=1000000 fail-count=1000000 last-failure='Sun Feb 4 19:48:00 2018'
res_asterisk_asterisk: migration-threshold=1000000 fail-count=18 last-failure='Sun Feb 4 19:50:21 2018'
Failed actions:
res_apache_apache_start_0 on fpbx2.hrm1.group.ru 'not configured' (6): call=41, status=complete, last-rc-change='Sun Feb 4 19:47:57 2018', queued=0ms, exec=15ms
res_Filesystem_1_start_0 on fpbx2.hrm1.group.ru 'unknown error' (1): call=34, status=complete, last-rc-change='Sun Feb 4 19:47:56 2018', queued=0ms, exec=78ms
res_mysql_mysql_fpbx_start_0 on fpbx2.hrm1.group.ru 'not installed' (5): call=45, status=complete, last-rc-change='Sun Feb 4 19:47:59 2018', queued=0ms, exec=47ms
res_asterisk_asterisk_monitor_15000 on fpbx2.hrm1.group.ru 'not running' (7): call=83, status=complete, last-rc-change='Sun Feb 4 19:50:21 2018', queued=15001ms, exec=11ms
如果我尝试重置所有资源
crm_resource --resource res_filesystem_1 -P
crm_resource --resource res_mysql_mysql_fpbx -P
crm_resource --resource res_asterisk_asterisk -P
crm_resource --resource res_apache_apache -P
所有服务在下一次节点切换之前启动并正常工作。
关闭后,节点无法关闭并说:waiting for cluster services to unload
。
我尝试通过 vm 上的按钮将其关闭,但它仍然没有挂载到第二个节点上。
然后我切换节点 cat /proc/drbd
正确显示 Primory/secondary
这是 crm 配置文件
crm configure show
node fpbx1.hrm1.group.ru \
attributes standby=off
node fpbx2.hrm1.group.ru \
attributes standby=off
primitive res_Filesystem_1 Filesystem \
params device="/dev/drbd/by-res/fpbx" directory="/mnt/drbd0" fstype=ext4 \
operations $id=res_Filesystem_1-operations \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60 \
op monitor interval=20 timeout=40 start-delay=0 \
op notify interval=0 timeout=60 \
meta target-role=started
primitive res_IPaddr2_1 IPaddr2 \
params ip=10.0.15.77 \
operations $id=res_IPaddr2_1-operations \
op start interval=0 timeout=21 \
op stop interval=0 timeout=20 \
op monitor interval=10 timeout=20 start-delay=0 \
meta target-role=started
primitive res_IPsrcaddr_src_addr IPsrcaddr \
params ipaddress=10.0.15.77 \
operations $id=res_IPsrcaddr_src_addr-operations \
op start interval=0 timeout=20 \
op stop interval=0 timeout=20 \
op monitor interval=10 timeout=20 start-delay=0 \
meta
primitive res_apache_apache apache \
params port=80 \
operations $id=res_apache_apache-operations \
op start interval=0 timeout=40 \
op stop interval=0 timeout=60 \
op monitor interval=10 timeout=20 start-delay=0 \
meta target-role=Started
primitive res_asterisk_asterisk service:asterisk \
operations $id=res_asterisk_asterisk-operations \
op start interval=0 timeout=15 \
op stop interval=0 timeout=15 \
op monitor interval=15 timeout=15 start-delay=15 \
meta target-role=started
primitive res_drbd_1 ocf:linbit:drbd \
params drbd_resource=fpbx \
operations $id=res_drbd_1-operations \
op start interval=0 timeout=240 \
op promote interval=0 timeout=90 \
op demote interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=10 timeout=20 start-delay=0 \
op notify interval=0 timeout=90 \
meta target-role=master
primitive res_mysql_mysql_fpbx mysql \
params enable_creation=false \
operations $id=res_mysql_mysql_fpbx-operations \
op start interval=0 timeout=120 \
op stop interval=0 timeout=120 \
op monitor interval=30 timeout=30 start-delay=0 \
op notify interval=0 timeout=90 \
meta target-role=started
ms ms_drbd_1 res_drbd_1 \
meta clone-max=2 notify=true interleave=true
property cib-bootstrap-options: \
stonith-enabled=false \
dc-version=1.1.11-97629de \
no-quorum-policy=ignore \
cluster-infrastructure=cman \
last-lrm-refresh=1517404556
rsc_defaults rsc-options: \
resource-stickiness=100
我在这里真正缺少的(根据您提供的日志)是一个 order
和一个匹配的 colocation
原语。两者都可以帮助您的集群以正确的顺序在同一节点上启动服务。例如:先是 res_drbd_1
,然后是 res_Filesystem_1
。
它可能看起来像这样:
order o_drbd_1_before_Filesystem_1 +inf: ms_drbd_1:promote res_Filesystem_1:start
colocation co_Filesystem_1_with_drbd_1 +inf: res_Filesystem_1 ms_drbd_1:Master
(详细说明所有其他配置的服务)
这是我的 freepbx active/passive 集群。它适用于 proxmox 管理程序。
关闭master后,第二个节点上的res_filesystem_1不启动。 drbd 不挂载。使用 drbd 的服务没有启动。
我有很多错误然后我显示状态
/usr/sbin/crm_mon -1 -r -f
Last updated: Sun Feb 4 19:50:21 2018
Last change: Sun Feb 4 19:45:35 2018
Stack: cman
Current DC: fpbx2.hrm1.group.ru - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured
8 Resources configured
Node fpbx1.hrm1.group.ru: OFFLINE (standby)
Online: [ fpbx2.hrm1.group.ru ]
Full list of resources:
Master/Slave Set: ms_drbd_1 [res_drbd_1]
Masters: [ fpbx2.hrm1.group.ru ]
Stopped: [ fpbx1.hrm1.group.ru ]
res_Filesystem_1 (ocf::heartbeat:Filesystem): Stopped
res_IPaddr2_1 (ocf::heartbeat:IPaddr2): Started fpbx2.hrm1.group.ru
res_asterisk_asterisk (service:asterisk): FAILED fpbx2.hrm1.group.ru
res_apache_apache (ocf::heartbeat:apache): Stopped
res_mysql_mysql_fpbx (ocf::heartbeat:mysql): Stopped
res_IPsrcaddr_src_addr (ocf::heartbeat:IPsrcaddr): Started fpbx2.hrm1.group.ru
Migration summary:
* Node fpbx2.hrm1.group.ru:
res_apache_apache: migration-threshold=1000000 fail-count=1000000 last-failure='Sun Feb 4 19:47:57 2018'
res_Filesystem_1: migration-threshold=1000000 fail-count=1000000 last-failure='Sun Feb 4 19:47:57 2018'
res_mysql_mysql_fpbx: migration-threshold=1000000 fail-count=1000000 last-failure='Sun Feb 4 19:48:00 2018'
res_asterisk_asterisk: migration-threshold=1000000 fail-count=18 last-failure='Sun Feb 4 19:50:21 2018'
Failed actions:
res_apache_apache_start_0 on fpbx2.hrm1.group.ru 'not configured' (6): call=41, status=complete, last-rc-change='Sun Feb 4 19:47:57 2018', queued=0ms, exec=15ms
res_Filesystem_1_start_0 on fpbx2.hrm1.group.ru 'unknown error' (1): call=34, status=complete, last-rc-change='Sun Feb 4 19:47:56 2018', queued=0ms, exec=78ms
res_mysql_mysql_fpbx_start_0 on fpbx2.hrm1.group.ru 'not installed' (5): call=45, status=complete, last-rc-change='Sun Feb 4 19:47:59 2018', queued=0ms, exec=47ms
res_asterisk_asterisk_monitor_15000 on fpbx2.hrm1.group.ru 'not running' (7): call=83, status=complete, last-rc-change='Sun Feb 4 19:50:21 2018', queued=15001ms, exec=11ms
如果我尝试重置所有资源
crm_resource --resource res_filesystem_1 -P
crm_resource --resource res_mysql_mysql_fpbx -P
crm_resource --resource res_asterisk_asterisk -P
crm_resource --resource res_apache_apache -P
所有服务在下一次节点切换之前启动并正常工作。
关闭后,节点无法关闭并说:waiting for cluster services to unload
。
我尝试通过 vm 上的按钮将其关闭,但它仍然没有挂载到第二个节点上。
然后我切换节点 cat /proc/drbd
正确显示 Primory/secondary
这是 crm 配置文件
crm configure show
node fpbx1.hrm1.group.ru \
attributes standby=off
node fpbx2.hrm1.group.ru \
attributes standby=off
primitive res_Filesystem_1 Filesystem \
params device="/dev/drbd/by-res/fpbx" directory="/mnt/drbd0" fstype=ext4 \
operations $id=res_Filesystem_1-operations \
op start interval=0 timeout=60 \
op stop interval=0 timeout=60 \
op monitor interval=20 timeout=40 start-delay=0 \
op notify interval=0 timeout=60 \
meta target-role=started
primitive res_IPaddr2_1 IPaddr2 \
params ip=10.0.15.77 \
operations $id=res_IPaddr2_1-operations \
op start interval=0 timeout=21 \
op stop interval=0 timeout=20 \
op monitor interval=10 timeout=20 start-delay=0 \
meta target-role=started
primitive res_IPsrcaddr_src_addr IPsrcaddr \
params ipaddress=10.0.15.77 \
operations $id=res_IPsrcaddr_src_addr-operations \
op start interval=0 timeout=20 \
op stop interval=0 timeout=20 \
op monitor interval=10 timeout=20 start-delay=0 \
meta
primitive res_apache_apache apache \
params port=80 \
operations $id=res_apache_apache-operations \
op start interval=0 timeout=40 \
op stop interval=0 timeout=60 \
op monitor interval=10 timeout=20 start-delay=0 \
meta target-role=Started
primitive res_asterisk_asterisk service:asterisk \
operations $id=res_asterisk_asterisk-operations \
op start interval=0 timeout=15 \
op stop interval=0 timeout=15 \
op monitor interval=15 timeout=15 start-delay=15 \
meta target-role=started
primitive res_drbd_1 ocf:linbit:drbd \
params drbd_resource=fpbx \
operations $id=res_drbd_1-operations \
op start interval=0 timeout=240 \
op promote interval=0 timeout=90 \
op demote interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=10 timeout=20 start-delay=0 \
op notify interval=0 timeout=90 \
meta target-role=master
primitive res_mysql_mysql_fpbx mysql \
params enable_creation=false \
operations $id=res_mysql_mysql_fpbx-operations \
op start interval=0 timeout=120 \
op stop interval=0 timeout=120 \
op monitor interval=30 timeout=30 start-delay=0 \
op notify interval=0 timeout=90 \
meta target-role=started
ms ms_drbd_1 res_drbd_1 \
meta clone-max=2 notify=true interleave=true
property cib-bootstrap-options: \
stonith-enabled=false \
dc-version=1.1.11-97629de \
no-quorum-policy=ignore \
cluster-infrastructure=cman \
last-lrm-refresh=1517404556
rsc_defaults rsc-options: \
resource-stickiness=100
我在这里真正缺少的(根据您提供的日志)是一个 order
和一个匹配的 colocation
原语。两者都可以帮助您的集群以正确的顺序在同一节点上启动服务。例如:先是 res_drbd_1
,然后是 res_Filesystem_1
。
它可能看起来像这样:
order o_drbd_1_before_Filesystem_1 +inf: ms_drbd_1:promote res_Filesystem_1:start
colocation co_Filesystem_1_with_drbd_1 +inf: res_Filesystem_1 ms_drbd_1:Master
(详细说明所有其他配置的服务)