如何在一个节点上 运行 Rook-Ceph?

How to run Rook-Ceph on one node?

我在裸机上有一个单节点开发 Kubernetes 集群 运行ning (ubuntu 18.04),我需要使用 rook-ceph 测试我的应用程序。

我按照 rook-ceph 说明 (https://rook.io/docs/rook/v1.3/ceph-quickstart.html) 在 K8s 集群上安装它,如下所示,我唯一改变的是没有安装 cluster.yaml 我安装了 cluster-test.yaml在最后一步,因为在文档中提到“cluster-test.yaml:测试环境的集群设置,例如 minikube”

git clone --single-branch --branch release-1.3 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster.yaml

安装所有 OSD pod 后无法启动:

kubectl -n rook-ceph get pods
NAME                                            READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-pgpp5                          3/3     Running     0          3d
csi-cephfsplugin-provisioner-75f4cb8c76-9xw4m   5/5     Running     2          3d
csi-cephfsplugin-provisioner-75f4cb8c76-lk4h6   5/5     Running     1          3d
csi-rbdplugin-provisioner-6cfb8565c4-5dstt      6/6     Running     2          3d
csi-rbdplugin-provisioner-6cfb8565c4-k7t5r      6/6     Running     0          3d
csi-rbdplugin-rp2rm                             3/3     Running     0          3d
rook-ceph-mgr-a-5b78844689-k66pv                1/1     Running     0          3d
rook-ceph-mon-a-69f75569d9-prtlv                1/1     Running     0          3d
rook-ceph-operator-5698b8bd78-nrgvt             1/1     Running     1          3d
rook-ceph-osd-prepare-odin-wjvps                0/1     Completed   0          36m
rook-discover-n7x5s                             1/1     Running     0          3d

原因是找不到卷 rook-binaries":

kubectl -n rook-ceph describe pod rook-ceph-osd-prepare-odin-wjvps
...
...
Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       <unknown>              default-scheduler  Successfully assigned rook-ceph/rook-ceph-osd-prepare-odin-wjvps to odin
  Normal   Created         7m26s                  kubelet, odin      Created container copy-bins
  Normal   Started         7m26s                  kubelet, odin      Started container copy-bins
  Normal   Pulled          7m25s                  kubelet, odin      Container image "ceph/ceph:v15" already present on machine
  Normal   Created         7m25s                  kubelet, odin      Created container provision
  Normal   Started         7m25s                  kubelet, odin      Started container provision
  Normal   SandboxChanged  7m22s                  kubelet, odin      Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          7m21s (x2 over 7m26s)  kubelet, odin      Container image "rook/ceph:v1.3.8" already present on machine
  Warning  Failed          7m21s                  kubelet, odin      Error: cannot find volume "rook-binaries" to mount into container "copy-bins"

出于某种原因,OSD 跳过了我的 nvme 硬盘驱动器:

kubectl -n rook-ceph logs rook-ceph-osd-prepare-odin-wjvps -f
2020-07-27 15:20:40.463045 I | rookcmd: starting Rook v1.3.8 with arguments '/rook/rook ceph osd provision'
2020-07-27 15:20:40.463108 I | rookcmd: flag values: --cluster-id=9ebb0292-5238-4234-bbee-565c4de14571, --data-device-filter=all, --data-device-path-filter=, --data-devices=, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-flush-frequency=5s, --log-level=DEBUG, --metadata-device=, --node-name=odin, --operator-image=, --osd-database-size=0, --osd-store=, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=
2020-07-27 15:20:40.463114 I | op-mon: parsing mon endpoints: a=10.5.98.76:6789
2020-07-27 15:20:40.470018 I | op-osd: CRUSH location=root=default host=odin
2020-07-27 15:20:40.470033 I | cephcmd: crush location of osd: root=default host=odin
2020-07-27 15:20:40.474201 I | cephconfig: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2020-07-27 15:20:40.474320 I | cephconfig: generated admin config in /var/lib/rook/rook-ceph
2020-07-27 15:20:40.474440 D | cephosd: config file @ /etc/ceph/ceph.conf: [global]
fsid                = 4670f13d-e80a-4c20-a161-0079c69953db
mon initial members = a
mon host            = [v2:10.5.98.76:3300,v1:10.5.98.76:6789]
public addr         = 10.4.91.50
cluster addr        = 10.4.91.50

[client.admin]
keyring = /var/lib/rook/rook-ceph/client.admin.keyring

2020-07-27 15:20:40.474450 I | cephosd: discovering hardware
2020-07-27 15:20:40.474458 D | exec: Running command: lsblk --all --noheadings --list --output KNAME
2020-07-27 15:20:40.478069 D | exec: Running command: lsblk /dev/loop0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.479838 W | inventory: skipping device "loop0" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.479857 D | exec: Running command: lsblk /dev/loop1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.481477 W | inventory: skipping device "loop1" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.481500 D | exec: Running command: lsblk /dev/loop2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.483081 W | inventory: skipping device "loop2" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.483100 D | exec: Running command: lsblk /dev/loop3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.484695 W | inventory: skipping device "loop3" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.484712 D | exec: Running command: lsblk /dev/loop4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.486387 W | inventory: skipping device "loop4" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.486416 D | exec: Running command: lsblk /dev/loop5 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.488108 W | inventory: skipping device "loop5" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.488130 D | exec: Running command: lsblk /dev/loop6 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.490429 W | inventory: skipping device "loop6" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.490460 D | exec: Running command: lsblk /dev/loop7 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.492027 W | inventory: skipping device "loop7" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.492040 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.493631 W | inventory: skipping device "sda" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.493645 D | exec: Running command: lsblk /dev/nbd0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.495394 W | inventory: skipping device "nbd0" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.495411 D | exec: Running command: lsblk /dev/nbd1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.497052 W | inventory: skipping device "nbd1" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.497071 D | exec: Running command: lsblk /dev/nbd2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.498722 W | inventory: skipping device "nbd2" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.498741 D | exec: Running command: lsblk /dev/nbd3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.500671 W | inventory: skipping device "nbd3" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.500735 D | exec: Running command: lsblk /dev/nbd4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.502638 W | inventory: skipping device "nbd4" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.502669 D | exec: Running command: lsblk /dev/nbd5 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.504580 W | inventory: skipping device "nbd5" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.504598 D | exec: Running command: lsblk /dev/nbd6 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.506366 W | inventory: skipping device "nbd6" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.506384 D | exec: Running command: lsblk /dev/nbd7 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.507990 W | inventory: skipping device "nbd7" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.508008 D | exec: Running command: lsblk /dev/nvme0n1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.510929 D | exec: Running command: sgdisk --print /dev/nvme0n1
2020-07-27 15:20:40.513218 D | exec: Running command: udevadm info --query=property /dev/nvme0n1
2020-07-27 15:20:40.520321 D | exec: Running command: lsblk --noheadings --pairs /dev/nvme0n1
2020-07-27 15:20:40.523182 I | inventory: skipping device "nvme0n1" because it has child, considering the child instead.
2020-07-27 15:20:40.523220 D | exec: Running command: lsblk /dev/nvme0n1p1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.526389 D | exec: Running command: udevadm info --query=property /dev/nvme0n1p1
2020-07-27 15:20:40.534070 D | exec: Running command: lsblk /dev/nvme0n1p2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.536121 D | exec: Running command: udevadm info --query=property /dev/nvme0n1p2
2020-07-27 15:20:40.541434 D | exec: Running command: lsblk /dev/nvme0n1p3 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.543247 D | exec: Running command: udevadm info --query=property /dev/nvme0n1p3
2020-07-27 15:20:40.547962 D | exec: Running command: lsblk /dev/nvme0n1p4 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.549708 D | exec: Running command: udevadm info --query=property /dev/nvme0n1p4
2020-07-27 15:20:40.554266 D | exec: Running command: lsblk /dev/nbd8 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.555923 W | inventory: skipping device "nbd8" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.555937 D | exec: Running command: lsblk /dev/nbd9 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.557480 W | inventory: skipping device "nbd9" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.557493 D | exec: Running command: lsblk /dev/nbd10 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.559103 W | inventory: skipping device "nbd10" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.559118 D | exec: Running command: lsblk /dev/nbd11 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.560668 W | inventory: skipping device "nbd11" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.560682 D | exec: Running command: lsblk /dev/nbd12 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.562237 W | inventory: skipping device "nbd12" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.562251 D | exec: Running command: lsblk /dev/nbd13 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.563800 W | inventory: skipping device "nbd13" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.563814 D | exec: Running command: lsblk /dev/nbd14 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.565334 W | inventory: skipping device "nbd14" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.565349 D | exec: Running command: lsblk /dev/nbd15 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.566990 W | inventory: skipping device "nbd15" because 'lsblk' failed. diskType is empty
2020-07-27 15:20:40.567005 D | inventory: discovered disks are [0xc0001eaa20 0xc0001ebb00 0xc0001e9d40 0xc00016e360]
2020-07-27 15:20:40.567009 I | cephosd: creating and starting the osds
2020-07-27 15:20:40.567029 D | cephosd: desiredDevices are [{Name:all OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:true IsDevicePathFilter:false}]
2020-07-27 15:20:40.567033 D | cephosd: context.Devices are [0xc0001eaa20 0xc0001ebb00 0xc0001e9d40 0xc00016e360]
2020-07-27 15:20:40.567037 I | cephosd: skipping device "nvme0n1p1" because it contains a filesystem "vfat"
2020-07-27 15:20:40.567042 D | exec: Running command: udevadm info --query=property /dev/nvme0n1p2
2020-07-27 15:20:40.571765 D | exec: Running command: lsblk /dev/nvme0n1p2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME
2020-07-27 15:20:40.573542 D | exec: Running command: ceph-volume inventory --format json /dev/nvme0n1p2
2020-07-27 15:20:40.997597 I | cephosd: skipping device "nvme0n1p2": ["Insufficient space (<5GB)"].
2020-07-27 15:20:40.997615 I | cephosd: skipping device "nvme0n1p3" because it contains a filesystem "ntfs"
2020-07-27 15:20:40.997620 I | cephosd: skipping device "nvme0n1p4" because it contains a filesystem "ext4"
2020-07-27 15:20:41.001473 I | cephosd: configuring osd devices: {"Entries":{}}
2020-07-27 15:20:41.001492 I | cephosd: no new devices to configure. returning devices already configured with ceph-volume.
2020-07-27 15:20:41.001502 D | exec: Running command: ceph-volume lvm list  --format json
2020-07-27 15:20:41.324600 I | cephosd: 0 ceph-volume lvm osd devices configured on this node
2020-07-27 15:20:41.324618 W | cephosd: skipping OSD configuration as no devices matched the storage settings for this node "odin"

还有我的硬盘:

sudo df -h -T
Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs  7,8G     0  7,8G   0% /dev
tmpfs          tmpfs     1,6G  3,9M  1,6G   1% /run
/dev/nvme0n1p4 ext4      284G  130G  140G  48% /
tmpfs          tmpfs     7,8G  539M  7,3G   7% /dev/shm
tmpfs          tmpfs     5,0M  4,0K  5,0M   1% /run/lock
tmpfs          tmpfs     7,8G     0  7,8G   0% /sys/fs/cgroup
/dev/nvme0n1p1 vfat      508M   31M  478M   6% /boot/efi

我该如何解决这个问题?我只需要 运行 暂时为我的开发人员安装它(由于某些原因我无法在 minikube/microk8s 上安装我的应用程序)

问题是 Ceph 需要一个未格式化的分区或 HDD,在我删除 NTFS 分区后它开始工作了。

就我自己而言,我注意到您只需要为您的磁盘执行此操作:

dd if=/dev/zero of=/dev/sda bs=1M status=progress

然后cluster.yamlrook-ceph顺利起床