替换磁盘同时保留 osd id

Question

在 ceph 集群中，我们如何在保持 osd id 的同时更换故障磁盘？
以下是遵循的步骤（不成功）：

# 1 destroy the failed osd(s) 
for i in 38 41 44 47; do ceph osd destroy $i --yes-i-really-mean-it; done
# 2 create the new ones that take the previous osd ids
ceph orch apply osd -i replace.yaml
# Scheduled osd.ceph_osd_ssd update...

replace.yaml:

service_type: osd
service_id: ceph_osd_ssd  # "ceph_osd_hdd" for hdd 
placement:
  hosts: 
    - storage01
data_devices:                  
  paths: 
    - /dev/sdz
    - /dev/sdaa
    - /dev/sdab
    - /dev/sdac
osd_id_claims: 
   storage01: ['38', '41', '44', '47']

但是没有任何反应，osd id 仍然显示为已损坏且设备没有 osd id。

# ceph -s
  cluster:
    id:     db2b7dd0-1e3b-11eb-be3b-40a6b721faf4
    health: HEALTH_WARN
            failed to probe daemons or devices
            5 daemons have recently crashed

我也试过运行这个

ceph orch daemon add osd storage01:/dev/sdaa

给出：

Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1177, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 141, in handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 318, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 103, in <lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 92, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 713, in _daemon_add_osd
    raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 643, in raise_if_exception
    raise e
RuntimeError: cephadm exited with an error code: 1, stderr:INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/ceph-authtool --gen-print-key
INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new daca7735-179b-4443-acef-412bc39865e3
INFO:cephadm:/bin/podman:stderr Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-daca7735-179b-4443-acef-412bc39865e3 ceph-0a533319-def2-4fbe-82f5-e76f971b7f48
INFO:cephadm:/bin/podman:stderr  stderr: Calculated size of logical volume is 0 extents. Needs to be larger.
INFO:cephadm:/bin/podman:stderr --> Was unable to complete a new OSD, will rollback changes
INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.38 --yes-i-really-mean-it
INFO:cephadm:/bin/podman:stderr  stderr: purged osd.38
INFO:cephadm:/bin/podman:stderr -->  RuntimeError: command returned non-zero exit status: 5
Traceback (most recent call last):
  File "<stdin>", line 5204, in <module>
  File "<stdin>", line 1116, in _infer_fsid
  File "<stdin>", line 1199, in _infer_image
  File "<stdin>", line 3322, in command_ceph_volume
  File "<stdin>", line 878, in call_throws
RuntimeError: Failed command: /bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=storage01 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -v /var/run/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/run/ceph:z -v /var/log/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/log/ceph:z -v /var/lib/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /tmp/ceph-tmp3vjwl32x:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpclrbifgb:/var/lib/ceph/bootstrap-osd/ceph.keyring:z --entrypoint /usr/sbin/ceph-volume docker.io/ceph/ceph:v15 lvm prepare --bluestore --data /dev/sdaa --no-systemd

切换设备也出错：

ceph orch device zap storage01 /dev/sdaa --force

Error EINVAL: Zap failed: INFO:cephadm:/bin/podman:stderr --> Zapping: /dev/sdaa
INFO:cephadm:/bin/podman:stderr --> Zapping lvm member /dev/sdaa. lv_path is /dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae
INFO:cephadm:/bin/podman:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae bs=1M count=10 conv=fsync
INFO:cephadm:/bin/podman:stderr  stderr: dd: fsync failed for '/dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae': Input/output error
INFO:cephadm:/bin/podman:stderr  stderr: 10+0 records in
INFO:cephadm:/bin/podman:stderr 10+0 records out
INFO:cephadm:/bin/podman:stderr 10485760 bytes (10 MB, 10 MiB) copied, 0.00846806 s, 1.2 GB/s
INFO:cephadm:/bin/podman:stderr -->  RuntimeError: command returned non-zero exit status: 1
Traceback (most recent call last):
  File "<stdin>", line 5203, in <module>
  File "<stdin>", line 1115, in _infer_fsid
  File "<stdin>", line 1198, in _infer_image
  File "<stdin>", line 3321, in command_ceph_volume
  File "<stdin>", line 877, in call_throws
RuntimeError: Failed command: /bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=storage01 -v /var/run/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/run/ceph:z -v /var/log/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4:/var/log/ceph:z -v /var/lib/ceph/db2b7dd0-1e3b-11eb-be3b-40a6b721faf4/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm --entrypoint /usr/sbin/ceph-volume docker.io/ceph/ceph:v15 lvm zap --destroy /dev/sdaa

这里是相关文档：

Answer 1

lvremove  /dev/ceph-0a533319-def2-4fbe-82f5-e76f971b7f48/osd-data-9a23996c-6b99-4a46-b539-1dfe2e9358ae -y
vgremove  ceph-0a533319-def2-4fbe-82f5-e76f971b7f48

对所有人执行此操作，然后重新运行 zaps：

for i in '/dev/sdz' '/dev/sdaa' '/dev/sdab' '/dev/sdac'; do ceph orch device zap storage01 $i --force; done

最后

ceph orch apply osd -i replace.yaml

替换磁盘同时保留 osd id

Replacing disk while retaining osd id

ceph

podman