GKE |当部署在 kube-system 命名空间中时,Statefulset 与服务一起被删除

GKE | Statefulset gets deleted along with a service when deployed in the kube-system namespace

我在同一区域中有一个包含 5 个节点的 GKE 集群。我正在尝试在 kube-system 命名空间 上部署 3 个节点的 Elasticsearch statefulset,但每次我这样做时,statefulset 都会被删除并且 pods 进入 Terminating 创建第二个 pod 后的状态。

我试图检查 pod 日志describe pod 以获取任何信息,但没有找到有用的信息。

我什至检查了 GKE 集群日志,在那里我检测到删除请求日志,但没有关于谁发起它或为什么发生的额外信息。

当我将命名空间更改为默认时,一切都很好,pods 处于就绪状态。

下面是我用于此部署的清单文件。

# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: elasticsearch-logging
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - "services"
  - "namespaces"
  - "endpoints"
  verbs:
  - "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: kube-system
  name: elasticsearch-logging
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
  name: elasticsearch-logging
  namespace: kube-system
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: elasticsearch-logging
  apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    version: 7.16.2
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
spec:
  serviceName: elasticsearch-logging
  replicas: 2
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      k8s-app: elasticsearch-logging
      version: 7.16.2
  template:
    metadata:
      labels:
        k8s-app: elasticsearch-logging
        version: 7.16.2
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccountName: elasticsearch-logging
      containers:
      - image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
        name: elasticsearch-logging
        resources:
          # need more cpu upon initialization, therefore burstable class
          limits:
            cpu: 1000m
          requests:
            cpu: 100m
        ports:
        - containerPort: 9200
          name: db
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        volumeMounts:
        - name: elasticsearch-logging
          mountPath: /data
        env:
#Added by Nour
        - name: discovery.seed_hosts
          value: elasticsearch-master-headless
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      volumes:
      - name: elasticsearch-logging
#        emptyDir: {}
      # Elasticsearch requires vm.max_map_count to be at least 262144.
      # If your OS already sets up this number to a higher value, feel free
      # to remove this init container.
      initContainers:
      - image: alpine:3.6
        command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
        name: elasticsearch-logging-init
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-logging
    spec:
      storageClassName: "standard"
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 30Gi
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-logging
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    kubernetes.io/cluster-service: "true"
    #    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "Elasticsearch"
spec:
  type: NodePort
  ports:
  - port: 9200
    protocol: TCP
    targetPort: db
    nodePort: 31335
  selector:
    k8s-app: elasticsearch-logging
#Added by Nour
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: elasticsearch-master
  name: elasticsearch-master
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9200
    protocol: TCP
    targetPort: 9200
  - name: transport
    port: 9300
    protocol: TCP
    targetPort: 9300
  selector:
    app: elasticsearch-master
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: elasticsearch-master
  name: elasticsearch-master-headless
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9200
    protocol: TCP
    targetPort: 9200
  - name: transport
    port: 9300
    protocol: TCP
    targetPort: 9300
  clusterIP: None
  selector:
    app: elasticsearch-master

以下是可用的命名空间

$ kubectl get ns
NAME              STATUS   AGE
default           Active   4d15h
kube-node-lease   Active   4d15h
kube-public       Active   4d15h
kube-system       Active   4d15h

我是否使用了任何可能导致问题的旧 API 版本?

谢谢。

为了结束,我认为将最终答案粘贴在这里是有意义的。


I understand your curiousity, i guess GCP just started preventing people from deploying stuff to the kube-system namespaces as it has the risk of messing with GKE. I never tried to deploy stuff to the kube-system namespace before so i'm sure if it was always like this or we just changed it

Overall i recommend avoiding deploying stuff into the kube-system namespace in GKE```