Zookeeper pod 无法访问挂载的持久卷声明

Zookeeper pod can't access mounted persistent volume claim

我遇到了一个烦人的问题,我的 pod 无法访问挂载的持久卷。

Kubeadm: v1.19.2
Docker: 19.03.13
Zookeeper 图片library/zookeeper:3.6
集群信息Locally hosted, no Cloud Provide

K8s配置:

apiVersion: v1
kind: Service
metadata:
  name: zk-hs
  labels:
    app: zk
spec:
  selector:
    app: zk
  ports:
    - port: 2888
      targetPort: 2888
      name: server
      protocol: TCP
    - port: 3888
      targetPort: 3888
      name: leader-election
      protocol: TCP
  clusterIP: ""
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: zk-cs
  labels:
    app: zk
spec:
  selector:
    app: zk
  ports:
    - name: client
      protocol: TCP
      port: 2181
      targetPort: 2181
  type: LoadBalancer
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: zk-pdb
spec:
  selector:
    matchLabels:
      app: zk
  maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zk
spec:
  selector:
    matchLabels:
      app: zk
  serviceName: zk-hs
  replicas: 1
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: OrderedReady
  template:
    metadata:
      labels:
        app: zk
    spec:
      volumes:
        - name: zoo-config
          configMap:
            name: zoo-config
        - name: datadir
          persistentVolumeClaim:
            claimName: zoo-pvc
      containers:
        - name: zookeeper
          imagePullPolicy: Always
          image: "library/zookeeper:3.6"
          resources:
            requests:
              memory: "1Gi"
              cpu: "0.5"
          ports:
            - containerPort: 2181
              name: client
            - containerPort: 2888
              name: server
            - containerPort: 3888
              name: leader-election
          volumeMounts:
            - name: datadir
              mountPath: /var/lib/zookeeper/data
            - name: zoo-config
              mountPath: /conf
      securityContext:
        fsGroup: 2000
        runAsUser: 1000
        runAsNonRoot: true
  volumeClaimTemplates:
    - metadata:
        name: datadir
        annotations:
          volume.beta.kubernetes.io/storage-class: local-storage
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: local-storage
        resources:
          requests:
            storage: 10Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: zoo-config
  namespace: default
data:
  zoo.cfg: |
    tickTime=10000
    dataDir=/var/lib/zookeeper/data
    clientPort=2181
    initLimit=10
    syncLimit=4
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
kind: PersistentVolume
apiVersion: v1
metadata:
  name: zoo-pv
  labels:
    type: local
spec:
  storageClassName: local-storage
  persistentVolumeReclaimPolicy: Retain
  hostPath:
      path: "/mnt/data"
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - <node-name>

我已经尝试 运行 以 root 身份使用以下安全上下文的 pod,我知道这是一个糟糕的想法,纯粹是为了测试。然而,这导致了一系列其他问题。

securityContext:
  fsGroup: 0
  runAsUser: 0

pod 启动后,日志包含以下内容,

Zookeeper JMX enabled by default
Using config: /conf/zoo.cfg
<log4j Warnings>
Unable too access datadir, exiting abnormally

检查 pod,为我提供了以下信息,

~$ kubectl describe pod/zk-0
Name:         zk-0
Namespace:    default
Priority:     0
Node:         <node>
Start Time:   Sat, 26 Sep 2020 15:48:00 +0200
Labels:       app=zk
              controller-revision-hash=zk-6c68989bd
              statefulset.kubernetes.io/pod-name=zk-0
Annotations:  <none>
Status:       Running
IP:           <IP>
IPs:
  IP:           <IP>
Controlled By:  StatefulSet/zk
Containers:
  zookeeper:
    Container ID:   docker://281e177d677394604785542c231d21b71f1666a22e74c1c10ef88491dad7a522
    Image:          library/zookeeper:3.6
    Image ID:       docker-pullable://zookeeper@sha256:6c051390cfae7958ff427834937c353fc6c34484f6a84b3e4bc8c512b53a16f6
    Ports:          2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    3
      Started:      Sat, 26 Sep 2020 16:04:26 +0200
      Finished:     Sat, 26 Sep 2020 16:04:27 +0200
    Ready:          False
    Restart Count:  8
    Requests:
      cpu:        500m
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /conf from zoo-config (rw)
      /var/lib/zookeeper/data from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-88x56 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-zk-0
    ReadOnly:   false
  zoo-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      zoo-config
    Optional:  false
  default-token-88x56:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-88x56
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  17m                   default-scheduler  Successfully assigned default/zk-0 to <node>
  Normal   Pulled     17m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.932381527s
  Normal   Pulled     17m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.960610662s
  Normal   Pulled     17m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.959935633s
  Normal   Created    16m (x4 over 17m)     kubelet            Created container zookeeper
  Normal   Pulled     16m                   kubelet            Successfully pulled image "library/zookeeper:3.6" in 1.92551645s
  Normal   Started    16m (x4 over 17m)     kubelet            Started container zookeeper
  Normal   Pulling    15m (x5 over 17m)     kubelet            Pulling image "library/zookeeper:3.6"
  Warning  BackOff    2m35s (x71 over 17m)  kubelet            Back-off restarting failed container

对我来说,pod 似乎可以完全 rw 访问该卷,所以我不确定为什么它仍然拒绝访问该目录。任何帮助将不胜感激!

经过一番挖掘,我终于明白了为什么它不起作用。日志实际上告诉了我最后我需要知道的一切,挂载的 persistentVolumeClaim 根本没有正确的文件权限来从挂载的主机路径 /mnt/data 目录

中读取

为了解决这个问题,我以一种有点 hacky 的方式,向所有人授予 readwriteexecute 权限。

chmod 777 /mnt/data

可以找到概览here

这绝对不是解决问题的最安全方法,我强烈建议不要在任何生产环境中使用它。

可能更好的方法如下

sudo usermod -a -G 1000 1000