Ceph 更新后数据 100% 未知
Data 100% unknown after Ceph Update
我昨天将我的开发 Ceph 集群从 Jewel 更新到 Luminous。在我 运行 这个命令 "ceph osd require-osd-release luminous" 之前,一切似乎都很好。在那之后,我的集群中的数据现在完全未知。如果我对任何给定的 pg 进行详细查看,它会显示 "active+clean"。集群认为它们已降级且不干净。这是我看到的:
粉碎地图
-1 10.05318 root default
-2 3.71764 host cephfs01
0 0.09044 osd.0 up 1.00000 1.00000
1 1.81360 osd.1 up 1.00000 1.00000
2 1.81360 osd.2 up 1.00000 1.00000
-3 3.62238 host cephfs02
3 hdd 1.81360 osd.3 up 1.00000 1.00000
4 hdd 0.90439 osd.4 up 1.00000 1.00000
5 hdd 0.90439 osd.5 up 1.00000 1.00000
-4 2.71317 host cephfs03
6 hdd 0.90439 osd.6 up 1.00000 1.00000
7 hdd 0.90439 osd.7 up 1.00000 1.00000
8 hdd 0.90439 osd.8 up 1.00000 1.00000
健康
cluster:
id: 279e0565-1ab4-46f2-bb27-adcb1461e618
health: HEALTH_WARN
Reduced data availability: 1024 pgs inactive
Degraded data redundancy: 1024 pgs unclean
services:
mon: 2 daemons, quorum cephfsmon02,cephfsmon01
mgr: cephfsmon02(active)
mds: ceph_library-1/1/1 up {0=cephfsmds01=up:active}
osd: 9 osds: 9 up, 9 in; 306 remapped pgs
data:
pools: 2 pools, 1024 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
1024 unknown
HEALTH_WARN
Reduced data availability: 1024 pgs inactive; Degraded data redundancy: 1024 pgs unclean
PG_AVAILABILITY Reduced data availability: 1024 pgs inactive
pg 1.e6 is stuck inactive for 2239.530584, current state unknown, last acting []
pg 1.e8 is stuck inactive for 2239.530584, current state unknown, last acting []
pg 1.e9 is stuck inactive for 2239.530584, current state unknown, last acting []
集群中的每个 PG 看起来都是这样。
PG 详细信息
"stats": {
"version": "57'5211",
"reported_seq": "4527",
"reported_epoch": "57",
"state": "active+clean",
我无法运行擦除或修复 pgs 或 osds,因为:
ceph osd修复osd.0
无法指示 osd(s) 0 进行修复(未连接)
有什么想法吗?
问题出在防火墙上。我在每台主机上启动了防火墙,并立即找到了 pgs。
我昨天将我的开发 Ceph 集群从 Jewel 更新到 Luminous。在我 运行 这个命令 "ceph osd require-osd-release luminous" 之前,一切似乎都很好。在那之后,我的集群中的数据现在完全未知。如果我对任何给定的 pg 进行详细查看,它会显示 "active+clean"。集群认为它们已降级且不干净。这是我看到的:
粉碎地图
-1 10.05318 root default
-2 3.71764 host cephfs01
0 0.09044 osd.0 up 1.00000 1.00000
1 1.81360 osd.1 up 1.00000 1.00000
2 1.81360 osd.2 up 1.00000 1.00000
-3 3.62238 host cephfs02
3 hdd 1.81360 osd.3 up 1.00000 1.00000
4 hdd 0.90439 osd.4 up 1.00000 1.00000
5 hdd 0.90439 osd.5 up 1.00000 1.00000
-4 2.71317 host cephfs03
6 hdd 0.90439 osd.6 up 1.00000 1.00000
7 hdd 0.90439 osd.7 up 1.00000 1.00000
8 hdd 0.90439 osd.8 up 1.00000 1.00000
健康
cluster:
id: 279e0565-1ab4-46f2-bb27-adcb1461e618
health: HEALTH_WARN
Reduced data availability: 1024 pgs inactive
Degraded data redundancy: 1024 pgs unclean
services:
mon: 2 daemons, quorum cephfsmon02,cephfsmon01
mgr: cephfsmon02(active)
mds: ceph_library-1/1/1 up {0=cephfsmds01=up:active}
osd: 9 osds: 9 up, 9 in; 306 remapped pgs
data:
pools: 2 pools, 1024 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
1024 unknown
HEALTH_WARN
Reduced data availability: 1024 pgs inactive; Degraded data redundancy: 1024 pgs unclean PG_AVAILABILITY Reduced data availability: 1024 pgs inactive pg 1.e6 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e8 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e9 is stuck inactive for 2239.530584, current state unknown, last acting []
集群中的每个 PG 看起来都是这样。
PG 详细信息
"stats": {
"version": "57'5211",
"reported_seq": "4527",
"reported_epoch": "57",
"state": "active+clean",
我无法运行擦除或修复 pgs 或 osds,因为:
ceph osd修复osd.0 无法指示 osd(s) 0 进行修复(未连接)
有什么想法吗?
问题出在防火墙上。我在每台主机上启动了防火墙,并立即找到了 pgs。