Ceph Monitor 超出法定人数
Ceph Monitor out of quorum
我们的一个 ceph 监视器出现问题。集群使用了 3 个监视器,它们都已启动&运行。它们可以相互通信并给出相关的 ceph -s 输出。但是法定人数显示第二个监视器已关闭。应该关闭的监视器的 ceph -s 输出如下:
cluster:
id: bb1ab46a-d282-4530-bf5c-021e9c940958
health: HEALTH_WARN
insufficient standby MDS daemons available
noout flag(s) set
9 large omap objects
47 pgs not deep-scrubbed in time
application not enabled on 2 pool(s)
1/3 mons down, quorum mon1,mon3
services:
mon: 3 daemons, quorum mon1,mon3 (age 3d), out of quorum: mon2
mgr: mon1(active, since 3d)
mds: filesystem:1 {0=mon1=up:active}
osd: 77 osds: 77 up (since 3d), 77 in (since 2w)
flags noout
rbd-mirror: 1 daemon active (12512649)
rgw: 1 daemon active (mon1)
data:
pools: 13 pools, 1500 pgs
objects: 65.36M objects, 23 TiB
usage: 85 TiB used, 701 TiB / 785 TiB avail
pgs: 1500 active+clean
io:
client: 806 KiB/s wr, 0 op/s rd, 52 op/s wr
systemctl status ceph-mon@2.service 显示:
ceph-mon@2.service - Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Tue 2020-12-08 12:12:58 +03; 28s ago
Process: 2681 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2681 (code=exited, status=1/FAILURE)
Dec 08 12:12:48 mon2 systemd[1]: Unit ceph-mon@2.service entered failed state.
Dec 08 12:12:48 mon2 systemd[1]: ceph-mon@2.service failed.
Dec 08 12:12:58 mon2 systemd[1]: ceph-mon@2.service holdoff time over, scheduling restart.
Dec 08 12:12:58 mon2 systemd[1]: Stopped Ceph cluster monitor daemon.
Dec 08 12:12:58 mon2 systemd[1]: start request repeated too quickly for ceph-mon@2.service
Dec 08 12:12:58 mon2 systemd[1]: Failed to start Ceph cluster monitor daemon.
Dec 08 12:12:58 mon2 systemd[1]: Unit ceph-mon@2.service entered failed state.
Dec 08 12:12:58 mon2 systemd[1]: ceph-mon@2.service failed.
正在重新启动,Stop/Starting,Enable/Disabling 监视器守护程序不工作。文档在 var/run/ceph 中提到了监视器 asok 文件,我没有在假定的目录中,但其他监视器的 asok 文件就在适当的位置。现在我处于一种状态,我什至无法停止第二个监视器上的监视器守护程序,它只停留在失败状态。 /var/log/ceph 监控日志中没有显示任何日志。我应该做些什么?我在 ceph 方面没有太多经验,所以我不想在没有绝对确定的情况下进行更改,以免弄乱集群。
尝试在 MON2 上手动启动服务:
/usr/bin/ceph-mon -f --cluster Ceph --id 2 --setuser ceph --setgroup ceph
我们的一个 ceph 监视器出现问题。集群使用了 3 个监视器,它们都已启动&运行。它们可以相互通信并给出相关的 ceph -s 输出。但是法定人数显示第二个监视器已关闭。应该关闭的监视器的 ceph -s 输出如下:
cluster:
id: bb1ab46a-d282-4530-bf5c-021e9c940958
health: HEALTH_WARN
insufficient standby MDS daemons available
noout flag(s) set
9 large omap objects
47 pgs not deep-scrubbed in time
application not enabled on 2 pool(s)
1/3 mons down, quorum mon1,mon3
services:
mon: 3 daemons, quorum mon1,mon3 (age 3d), out of quorum: mon2
mgr: mon1(active, since 3d)
mds: filesystem:1 {0=mon1=up:active}
osd: 77 osds: 77 up (since 3d), 77 in (since 2w)
flags noout
rbd-mirror: 1 daemon active (12512649)
rgw: 1 daemon active (mon1)
data:
pools: 13 pools, 1500 pgs
objects: 65.36M objects, 23 TiB
usage: 85 TiB used, 701 TiB / 785 TiB avail
pgs: 1500 active+clean
io:
client: 806 KiB/s wr, 0 op/s rd, 52 op/s wr
systemctl status ceph-mon@2.service 显示:
ceph-mon@2.service - Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Tue 2020-12-08 12:12:58 +03; 28s ago
Process: 2681 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 2681 (code=exited, status=1/FAILURE)
Dec 08 12:12:48 mon2 systemd[1]: Unit ceph-mon@2.service entered failed state.
Dec 08 12:12:48 mon2 systemd[1]: ceph-mon@2.service failed.
Dec 08 12:12:58 mon2 systemd[1]: ceph-mon@2.service holdoff time over, scheduling restart.
Dec 08 12:12:58 mon2 systemd[1]: Stopped Ceph cluster monitor daemon.
Dec 08 12:12:58 mon2 systemd[1]: start request repeated too quickly for ceph-mon@2.service
Dec 08 12:12:58 mon2 systemd[1]: Failed to start Ceph cluster monitor daemon.
Dec 08 12:12:58 mon2 systemd[1]: Unit ceph-mon@2.service entered failed state.
Dec 08 12:12:58 mon2 systemd[1]: ceph-mon@2.service failed.
正在重新启动,Stop/Starting,Enable/Disabling 监视器守护程序不工作。文档在 var/run/ceph 中提到了监视器 asok 文件,我没有在假定的目录中,但其他监视器的 asok 文件就在适当的位置。现在我处于一种状态,我什至无法停止第二个监视器上的监视器守护程序,它只停留在失败状态。 /var/log/ceph 监控日志中没有显示任何日志。我应该做些什么?我在 ceph 方面没有太多经验,所以我不想在没有绝对确定的情况下进行更改,以免弄乱集群。
尝试在 MON2 上手动启动服务:
/usr/bin/ceph-mon -f --cluster Ceph --id 2 --setuser ceph --setgroup ceph