Ceph Luminous,我想念什么?
Ceph Luminous, what I miss?
对于之前的 Jewel 版本,我没有遇到任何问题。
我创建了一个 5 个虚拟机的测试集群,全部使用 Centos 7
以及 Ceph 的 Nautilus 版本。 1 个 vm 是监视器,3 个是 OSD 和 1 个
是管理员经理。
集群部署OK,健康OK,但是创建MDS和pools后...
ceph -s
cluster:
id: 87c90336-38bc-4ec2-bcde-2629e1e7b12f
health: HEALTH_WARN
Reduced data availability: 42 pgs inactive, 43 pgs peering
services:
mon: 1 daemons, quorum ceph1-mon (age 8m)
mgr: ceph1-admin(active, since 8m)
mds: cephfs:1 {0=ceph1-osd=up:active} 1 up:standby
osd: 3 osds: 3 up (since 7m), 3 in (since 20h)
data:
pools: 2 pools, 128 pgs
objects: 18 objects, 2.6 KiB
usage: 2.1 GiB used, 78 GiB / 80 GiB avail
pgs: 32.812% pgs unknown
67.188% pgs not active
86 peering
42 unknown
体检中..
ceph health detail
HEALTH_WARN Reduced data availability: 42 pgs inactive, 43 pgs peering
PG_AVAILABILITY Reduced data availability: 42 pgs inactive, 43 pgs peering
pg 9.0 is stuck peering for 254.671721, current state peering, last acting [0,1,2]
pg 9.1 is stuck peering for 254.671732, current state peering, last acting [0,2,1]
pg 9.4 is stuck peering for 254.670850, current state peering, last acting [0,1,2]
pg 9.5 is stuck inactive for 234.575775, current state unknown, last acting []
pg 9.7 is stuck inactive for 234.575775, current state unknown, last acting []
pg 9.8 is stuck inactive for 234.575775, current state unknown, last acting []
输出真的很长。许多 PG 处于非活动状态或对等。
我用过这个配置:
#ceph.conf
[global]
fsid = 87c90336-38bc-4ec2-bcde-2629e1e7b12f
mon_initial_members = ceph1-mon
mon_host = 10.2.0.117
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon_allow_pool_delete = true
mon_max_pg_per_osd = 128
osd max pg per osd hard ratio = 10 # < default is 2, try to set at least 5. It will be
我用这些命令创建了 OSD:
ceph-deploy --overwrite-conf osd create --data /dev/vdb ceph1-osd
ceph-deploy --overwrite-conf osd create --data /dev/vdb ceph2-osd
ceph-deploy --overwrite-conf osd create --data /dev/vdb ceph3-osd
我已经使用这些命令创建了 MDS:
ceph-deploy mds create ceph1-osd
ceph-deploy mds create ceph2-osd
ceph-deploy mds create ceph3-osd
对于池和文件系统,我使用了这些命令:
ceph osd pool create cephfs_data 64
ceph osd pool create cephfs_metadata 64
ceph fs new cephfs cephfs_metadata cephfs_data
怎么了?
在大多数情况下,此类 peering/unknown PG 与连接问题有关。监视器和 OSD 是否能够相互访问?可能是防火墙问题或一些错误的路由导致了问题?
此外,OSD 和监视器日志也值得一试。日志中是否有错误(肯定是)?
检查所有这些将指导您解决问题。
对于之前的 Jewel 版本,我没有遇到任何问题。 我创建了一个 5 个虚拟机的测试集群,全部使用 Centos 7 以及 Ceph 的 Nautilus 版本。 1 个 vm 是监视器,3 个是 OSD 和 1 个 是管理员经理。 集群部署OK,健康OK,但是创建MDS和pools后...
ceph -s
cluster:
id: 87c90336-38bc-4ec2-bcde-2629e1e7b12f
health: HEALTH_WARN
Reduced data availability: 42 pgs inactive, 43 pgs peering
services:
mon: 1 daemons, quorum ceph1-mon (age 8m)
mgr: ceph1-admin(active, since 8m)
mds: cephfs:1 {0=ceph1-osd=up:active} 1 up:standby
osd: 3 osds: 3 up (since 7m), 3 in (since 20h)
data:
pools: 2 pools, 128 pgs
objects: 18 objects, 2.6 KiB
usage: 2.1 GiB used, 78 GiB / 80 GiB avail
pgs: 32.812% pgs unknown
67.188% pgs not active
86 peering
42 unknown
体检中..
ceph health detail
HEALTH_WARN Reduced data availability: 42 pgs inactive, 43 pgs peering
PG_AVAILABILITY Reduced data availability: 42 pgs inactive, 43 pgs peering
pg 9.0 is stuck peering for 254.671721, current state peering, last acting [0,1,2]
pg 9.1 is stuck peering for 254.671732, current state peering, last acting [0,2,1]
pg 9.4 is stuck peering for 254.670850, current state peering, last acting [0,1,2]
pg 9.5 is stuck inactive for 234.575775, current state unknown, last acting []
pg 9.7 is stuck inactive for 234.575775, current state unknown, last acting []
pg 9.8 is stuck inactive for 234.575775, current state unknown, last acting []
输出真的很长。许多 PG 处于非活动状态或对等。 我用过这个配置:
#ceph.conf
[global]
fsid = 87c90336-38bc-4ec2-bcde-2629e1e7b12f
mon_initial_members = ceph1-mon
mon_host = 10.2.0.117
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon_allow_pool_delete = true
mon_max_pg_per_osd = 128
osd max pg per osd hard ratio = 10 # < default is 2, try to set at least 5. It will be
我用这些命令创建了 OSD:
ceph-deploy --overwrite-conf osd create --data /dev/vdb ceph1-osd
ceph-deploy --overwrite-conf osd create --data /dev/vdb ceph2-osd
ceph-deploy --overwrite-conf osd create --data /dev/vdb ceph3-osd
我已经使用这些命令创建了 MDS:
ceph-deploy mds create ceph1-osd
ceph-deploy mds create ceph2-osd
ceph-deploy mds create ceph3-osd
对于池和文件系统,我使用了这些命令:
ceph osd pool create cephfs_data 64
ceph osd pool create cephfs_metadata 64
ceph fs new cephfs cephfs_metadata cephfs_data
怎么了?
在大多数情况下,此类 peering/unknown PG 与连接问题有关。监视器和 OSD 是否能够相互访问?可能是防火墙问题或一些错误的路由导致了问题?
此外,OSD 和监视器日志也值得一试。日志中是否有错误(肯定是)?
检查所有这些将指导您解决问题。