为什么部署rook-ceph后在kubernetes中找不到osd pod?

Why can't find osd pod in kubernetes after deploying rook-ceph?

尝试按照本指南在 kubernetes 上安装 rook-ceph:

https://rook.io/docs/rook/v1.3/ceph-quickstart.html

git clone --single-branch --branch release-1.3 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster.yaml

当我检查所有 pods

$ kubectl -n rook-ceph get pod
NAME                                            READY   STATUS    RESTARTS   AGE
csi-cephfsplugin-9c2z9                          3/3     Running   0          23m
csi-cephfsplugin-provisioner-7678bcfc46-s67hq   5/5     Running   0          23m
csi-cephfsplugin-provisioner-7678bcfc46-sfljd   5/5     Running   0          23m
csi-cephfsplugin-smmlf                          3/3     Running   0          23m
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq       6/6     Running   0          23m
csi-rbdplugin-provisioner-fbd45b7c8-rp85z       6/6     Running   0          23m
csi-rbdplugin-s67lw                             3/3     Running   0          23m
csi-rbdplugin-zq4k5                             3/3     Running   0          23m
rook-ceph-mon-a-canary-954dc5cd9-5q8tk          1/1     Running   0          2m9s
rook-ceph-mon-b-canary-b9d6f5594-mcqwc          1/1     Running   0          2m9s
rook-ceph-mon-c-canary-78b48dbfb7-z2t7d         0/1     Pending   0          2m8s
rook-ceph-operator-757d6db48d-x27lm             1/1     Running   0          25m
rook-ceph-tools-75f575489-znbbz                 1/1     Running   0          7m45s
rook-discover-gq489                             1/1     Running   0          24m
rook-discover-p9zlg                             1/1     Running   0          24m
$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
No resources found in rook-ceph namespace.

做一些其他操作

$ kubectl taint nodes $(hostname) node-role.kubernetes.io/master:NoSchedule-
$ kubectl -n rook-ceph-system delete pods rook-ceph-operator-757d6db48d-x27lm

创建文件系统

$ kubectl create -f filesystem.yaml

再次检查

$ kubectl get pods -n rook-ceph -o wide
NAME                                              READY   STATUS     RESTARTS   AGE    IP             NODE     NOMINATED NODE   READINESS GATES
csi-cephfsplugin-9c2z9                            3/3     Running    0          135m   192.168.0.53   kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-s67hq     5/5     Running    0          135m   10.1.2.6       kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-sfljd     5/5     Running    0          135m   10.1.2.5       kube3    <none>           <none>
csi-cephfsplugin-smmlf                            3/3     Running    0          135m   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-dnwsq         6/6     Running    0          135m   10.1.1.6       kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-rp85z         6/6     Running    0          135m   10.1.1.5       kube2    <none>           <none>
csi-rbdplugin-s67lw                               3/3     Running    0          135m   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-zq4k5                               3/3     Running    0          135m   192.168.0.53   kube3    <none>           <none>
rook-ceph-crashcollector-kube2-6d95bb9c-r5w7p     0/1     Init:0/2   0          110m   <none>         kube2    <none>           <none>
rook-ceph-crashcollector-kube3-644c849bdb-9hcvg   0/1     Init:0/2   0          110m   <none>         kube3    <none>           <none>
rook-ceph-mon-a-canary-954dc5cd9-6ccbh            1/1     Running    0          75s    10.1.2.130     kube3    <none>           <none>
rook-ceph-mon-b-canary-b9d6f5594-k85w5            1/1     Running    0          74s    10.1.1.74      kube2    <none>           <none>
rook-ceph-mon-c-canary-78b48dbfb7-kfzzx           0/1     Pending    0          73s    <none>         <none>   <none>           <none>
rook-ceph-operator-757d6db48d-nlh84               1/1     Running    0          110m   10.1.2.28      kube3    <none>           <none>
rook-ceph-tools-75f575489-znbbz                   1/1     Running    0          119m   10.1.1.14      kube2    <none>           <none>
rook-discover-gq489                               1/1     Running    0          135m   10.1.1.3       kube2    <none>           <none>
rook-discover-p9zlg                               1/1     Running    0          135m   10.1.2.4       kube3    <none>           <none>

无法将 pod 视为 rook-ceph-osd-

并且 rook-ceph-mon-c-canary-78b48dbfb7-kfzzx pod 总是 Pending

如果将工具箱安装为

https://rook.io/docs/rook/v1.3/ceph-toolbox.html

$ kubectl create -f toolbox.yaml
$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

在容器内,检查 ceph 状态

[root@rook-ceph-tools-75f575489-znbbz /]# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
[errno 2] error connecting to the cluster

运行 Ubuntu 16.04.6.


再次部署

$ kubectl -n rook-ceph get pod -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP             NODE     NOMINATED NODE   READINESS GATES
csi-cephfsplugin-4tww8                          3/3     Running   0          3m38s   192.168.0.52   kube2    <none>           <none>
csi-cephfsplugin-dbbfb                          3/3     Running   0          3m38s   192.168.0.53   kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-8kt96   5/5     Running   0          3m37s   10.1.2.6       kube3    <none>           <none>
csi-cephfsplugin-provisioner-7678bcfc46-kq6vv   5/5     Running   0          3m38s   10.1.1.6       kube2    <none>           <none>
csi-rbdplugin-4qrqn                             3/3     Running   0          3m39s   192.168.0.53   kube3    <none>           <none>
csi-rbdplugin-dqx9z                             3/3     Running   0          3m39s   192.168.0.52   kube2    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-7f57t       6/6     Running   0          3m39s   10.1.2.5       kube3    <none>           <none>
csi-rbdplugin-provisioner-fbd45b7c8-9zwhb       6/6     Running   0          3m39s   10.1.1.5       kube2    <none>           <none>
rook-ceph-mon-a-canary-954dc5cd9-rgqpg          1/1     Running   0          2m40s   10.1.1.7       kube2    <none>           <none>
rook-ceph-mon-b-canary-b9d6f5594-n2pwc          1/1     Running   0          2m35s   10.1.2.8       kube3    <none>           <none>
rook-ceph-mon-c-canary-78b48dbfb7-fv46f         0/1     Pending   0          2m30s   <none>         <none>   <none>           <none>
rook-ceph-operator-757d6db48d-2m25g             1/1     Running   0          6m27s   10.1.2.3       kube3    <none>           <none>
rook-discover-lpsht                             1/1     Running   0          5m15s   10.1.1.3       kube2    <none>           <none>
rook-discover-v4l77                             1/1     Running   0          5m15s   10.1.2.4       kube3    <none>           <none>

描述挂起的广告连播

$ kubectl describe pod rook-ceph-mon-c-canary-78b48dbfb7-fv46f -n rook-ceph
Name:           rook-ceph-mon-c-canary-78b48dbfb7-fv46f
Namespace:      rook-ceph
Priority:       0
Node:           <none>
Labels:         app=rook-ceph-mon
                ceph_daemon_id=c
                mon=c
                mon_canary=true
                mon_cluster=rook-ceph
                pod-template-hash=78b48dbfb7
                rook_cluster=rook-ceph
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/rook-ceph-mon-c-canary-78b48dbfb7
Containers:
  mon:
    Image:      rook/ceph:v1.3.4
    Port:       6789/TCP
    Host Port:  0/TCP
    Command:
      /tini
    Args:
      --
      sleep
      3600
    Environment:
      CONTAINER_IMAGE:                ceph/ceph:v14.2.9
      POD_NAME:                       rook-ceph-mon-c-canary-78b48dbfb7-fv46f (v1:metadata.name)
      POD_NAMESPACE:                  rook-ceph (v1:metadata.namespace)
      NODE_NAME:                       (v1:spec.nodeName)
      POD_MEMORY_LIMIT:               node allocatable (limits.memory)
      POD_MEMORY_REQUEST:             0 (requests.memory)
      POD_CPU_LIMIT:                  node allocatable (limits.cpu)
      POD_CPU_REQUEST:                0 (requests.cpu)
      ROOK_CEPH_MON_HOST:             <set to the key 'mon_host' in secret 'rook-ceph-config'>             Optional: false
      ROOK_CEPH_MON_INITIAL_MEMBERS:  <set to the key 'mon_initial_members' in secret 'rook-ceph-config'>  Optional: false
      ROOK_POD_IP:                     (v1:status.podIP)
    Mounts:
      /etc/ceph from rook-config-override (ro)
      /etc/ceph/keyring-store/ from rook-ceph-mons-keyring (ro)
      /var/lib/ceph/crash from rook-ceph-crash (rw)
      /var/lib/ceph/mon/ceph-c from ceph-daemon-data (rw)
      /var/log/ceph from rook-ceph-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-65xtn (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  rook-config-override:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rook-config-override
    Optional:  false
  rook-ceph-mons-keyring:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rook-ceph-mons-keyring
    Optional:    false
  rook-ceph-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/log
    HostPathType:  
  rook-ceph-crash:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/rook-ceph/crash
    HostPathType:  
  ceph-daemon-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/rook/mon-c/data
    HostPathType:  
  default-token-65xtn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-65xtn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  22s (x3 over 84s)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules.

测试坐骑

创建一个nginx.yaml文件

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.7.9
    ports:
    - containerPort: 80
    volumeMounts:
    - name: www
      mountPath: /usr/share/nginx/html
  volumes:
  - name: www
    flexVolume:
      driver: ceph.rook.io/rook
      fsType: ceph
      options:
        fsName: myfs
        clusterNamespace: rook-ceph

部署它并描述 pod 详细信息

...
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    9m28s                  default-scheduler  Successfully assigned default/nginx to kube2
  Warning  FailedMount  9m28s                  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www default-token-fnb28], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  6m14s (x2 over 6m38s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[default-token-fnb28 www]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched
  Warning  FailedMount  4m6s (x23 over 9m13s)  kubelet, kube2     Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[www default-token-fnb28]: failed to get Plugin from volumeSpec for volume "www" err=no volume plugin matched

rook-ceph-mon-x pods 具有以下亲和力:

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: rook-ceph-mon
        topologyKey: kubernetes.io/hostname

不允许 运行 2 rook-ceph-mon pods 在同一节点上。 由于您似乎有 3 个节点:1 个主节点和 2 个工作节点,因此创建了 2 个 pods,一个在 kube2 节点上,一个在 kube3 节点上。 kube1 是被污染为不可调度的主节点,因此无法在那里调度 rook-ceph-mon-c。

要解决它你可以:

  • 再添加一个工作节点
  • 使用 kubectl taint nodes kube1 key:NoSchedule-
  • 删除 NoSchedule 污点
  • mon count 更改为较低的值