Ceph 更新后数据 100% 未知

Question

我昨天将我的开发 Ceph 集群从 Jewel 更新到 Luminous。在我运行这个命令 "ceph osd require-osd-release luminous" 之前，一切似乎都很好。在那之后，我的集群中的数据现在完全未知。如果我对任何给定的 pg 进行详细查看，它会显示 "active+clean"。集群认为它们已降级且不干净。这是我看到的：

粉碎地图

-1       10.05318 root default                              
-2        3.71764     host cephfs01                         
 0        0.09044         osd.0         up  1.00000 1.00000 
 1        1.81360         osd.1         up  1.00000 1.00000 
 2        1.81360         osd.2         up  1.00000 1.00000 
-3        3.62238     host cephfs02                         
 3   hdd  1.81360         osd.3         up  1.00000 1.00000 
 4   hdd  0.90439         osd.4         up  1.00000 1.00000 
 5   hdd  0.90439         osd.5         up  1.00000 1.00000 
-4        2.71317     host cephfs03                         
 6   hdd  0.90439         osd.6         up  1.00000 1.00000 
 7   hdd  0.90439         osd.7         up  1.00000 1.00000 
 8   hdd  0.90439         osd.8         up  1.00000 1.00000

健康

  cluster:
    id:     279e0565-1ab4-46f2-bb27-adcb1461e618
    health: HEALTH_WARN
            Reduced data availability: 1024 pgs inactive
            Degraded data redundancy: 1024 pgs unclean

  services:
    mon: 2 daemons, quorum cephfsmon02,cephfsmon01
    mgr: cephfsmon02(active)
    mds: ceph_library-1/1/1 up  {0=cephfsmds01=up:active}
    osd: 9 osds: 9 up, 9 in; 306 remapped pgs

  data:
    pools:   2 pools, 1024 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:     100.000% pgs unknown
             1024 unknown

HEALTH_WARN

Reduced data availability: 1024 pgs inactive; Degraded data redundancy: 1024 pgs unclean PG_AVAILABILITY Reduced data availability: 1024 pgs inactive pg 1.e6 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e8 is stuck inactive for 2239.530584, current state unknown, last acting [] pg 1.e9 is stuck inactive for 2239.530584, current state unknown, last acting []

集群中的每个 PG 看起来都是这样。

PG 详细信息

"stats": {
                "version": "57'5211",
                "reported_seq": "4527",
                "reported_epoch": "57",
                "state": "active+clean",

我无法运行擦除或修复 pgs 或 osds，因为：

ceph osd修复osd.0 无法指示 osd(s) 0 进行修复（未连接）

有什么想法吗？

Answer 1

问题出在防火墙上。我在每台主机上启动了防火墙，并立即找到了 pgs。

Ceph 更新后数据 100% 未知

Data 100% unknown after Ceph Update

cluster-computing

ceph