GKE |当部署在 kube-system 命名空间中时,Statefulset 与服务一起被删除
GKE | Statefulset gets deleted along with a service when deployed in the kube-system namespace
我在同一区域中有一个包含 5 个节点的 GKE 集群。我正在尝试在 kube-system 命名空间 上部署 3 个节点的 Elasticsearch statefulset,但每次我这样做时,statefulset 都会被删除并且 pods 进入 Terminating 创建第二个 pod 后的状态。
我试图检查 pod 日志 并 describe pod 以获取任何信息,但没有找到有用的信息。
我什至检查了 GKE 集群日志,在那里我检测到删除请求日志,但没有关于谁发起它或为什么发生的额外信息。
当我将命名空间更改为默认时,一切都很好,pods 处于就绪状态。
下面是我用于此部署的清单文件。
# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "services"
- "namespaces"
- "endpoints"
verbs:
- "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: kube-system
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: elasticsearch-logging
namespace: kube-system
apiGroup: ""
roleRef:
kind: ClusterRole
name: elasticsearch-logging
apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
version: 7.16.2
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
spec:
serviceName: elasticsearch-logging
replicas: 2
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: 7.16.2
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: 7.16.2
kubernetes.io/cluster-service: "true"
spec:
serviceAccountName: elasticsearch-logging
containers:
- image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
name: elasticsearch-logging
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
- name: elasticsearch-logging
mountPath: /data
env:
#Added by Nour
- name: discovery.seed_hosts
value: elasticsearch-master-headless
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: elasticsearch-logging
# emptyDir: {}
# Elasticsearch requires vm.max_map_count to be at least 262144.
# If your OS already sets up this number to a higher value, feel free
# to remove this init container.
initContainers:
- image: alpine:3.6
command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
name: elasticsearch-logging-init
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-logging
spec:
storageClassName: "standard"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Elasticsearch"
spec:
type: NodePort
ports:
- port: 9200
protocol: TCP
targetPort: db
nodePort: 31335
selector:
k8s-app: elasticsearch-logging
#Added by Nour
---
apiVersion: v1
kind: Service
metadata:
labels:
app: elasticsearch-master
name: elasticsearch-master
namespace: kube-system
spec:
ports:
- name: http
port: 9200
protocol: TCP
targetPort: 9200
- name: transport
port: 9300
protocol: TCP
targetPort: 9300
selector:
app: elasticsearch-master
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
labels:
app: elasticsearch-master
name: elasticsearch-master-headless
namespace: kube-system
spec:
ports:
- name: http
port: 9200
protocol: TCP
targetPort: 9200
- name: transport
port: 9300
protocol: TCP
targetPort: 9300
clusterIP: None
selector:
app: elasticsearch-master
以下是可用的命名空间
$ kubectl get ns
NAME STATUS AGE
default Active 4d15h
kube-node-lease Active 4d15h
kube-public Active 4d15h
kube-system Active 4d15h
我是否使用了任何可能导致问题的旧 API 版本?
谢谢。
为了结束,我认为将最终答案粘贴在这里是有意义的。
I understand your curiousity, i guess GCP just started preventing people from deploying stuff to the kube-system namespaces as it has the risk of messing with GKE. I never tried to deploy stuff to the kube-system namespace before so i'm sure if it was always like this or we just changed it
Overall i recommend avoiding deploying stuff into the kube-system namespace in GKE```
我在同一区域中有一个包含 5 个节点的 GKE 集群。我正在尝试在 kube-system 命名空间 上部署 3 个节点的 Elasticsearch statefulset,但每次我这样做时,statefulset 都会被删除并且 pods 进入 Terminating 创建第二个 pod 后的状态。
我试图检查 pod 日志 并 describe pod 以获取任何信息,但没有找到有用的信息。
我什至检查了 GKE 集群日志,在那里我检测到删除请求日志,但没有关于谁发起它或为什么发生的额外信息。
当我将命名空间更改为默认时,一切都很好,pods 处于就绪状态。
下面是我用于此部署的清单文件。
# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "services"
- "namespaces"
- "endpoints"
verbs:
- "get"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: kube-system
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: elasticsearch-logging
namespace: kube-system
apiGroup: ""
roleRef:
kind: ClusterRole
name: elasticsearch-logging
apiGroup: ""
---
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
version: 7.16.2
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
spec:
serviceName: elasticsearch-logging
replicas: 2
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: 7.16.2
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: 7.16.2
kubernetes.io/cluster-service: "true"
spec:
serviceAccountName: elasticsearch-logging
containers:
- image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
name: elasticsearch-logging
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
- name: elasticsearch-logging
mountPath: /data
env:
#Added by Nour
- name: discovery.seed_hosts
value: elasticsearch-master-headless
- name: "NAMESPACE"
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: elasticsearch-logging
# emptyDir: {}
# Elasticsearch requires vm.max_map_count to be at least 262144.
# If your OS already sets up this number to a higher value, feel free
# to remove this init container.
initContainers:
- image: alpine:3.6
command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"]
name: elasticsearch-logging-init
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: elasticsearch-logging
spec:
storageClassName: "standard"
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "Elasticsearch"
spec:
type: NodePort
ports:
- port: 9200
protocol: TCP
targetPort: db
nodePort: 31335
selector:
k8s-app: elasticsearch-logging
#Added by Nour
---
apiVersion: v1
kind: Service
metadata:
labels:
app: elasticsearch-master
name: elasticsearch-master
namespace: kube-system
spec:
ports:
- name: http
port: 9200
protocol: TCP
targetPort: 9200
- name: transport
port: 9300
protocol: TCP
targetPort: 9300
selector:
app: elasticsearch-master
sessionAffinity: None
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
labels:
app: elasticsearch-master
name: elasticsearch-master-headless
namespace: kube-system
spec:
ports:
- name: http
port: 9200
protocol: TCP
targetPort: 9200
- name: transport
port: 9300
protocol: TCP
targetPort: 9300
clusterIP: None
selector:
app: elasticsearch-master
以下是可用的命名空间
$ kubectl get ns
NAME STATUS AGE
default Active 4d15h
kube-node-lease Active 4d15h
kube-public Active 4d15h
kube-system Active 4d15h
我是否使用了任何可能导致问题的旧 API 版本?
谢谢。
为了结束,我认为将最终答案粘贴在这里是有意义的。
I understand your curiousity, i guess GCP just started preventing people from deploying stuff to the kube-system namespaces as it has the risk of messing with GKE. I never tried to deploy stuff to the kube-system namespace before so i'm sure if it was always like this or we just changed it
Overall i recommend avoiding deploying stuff into the kube-system namespace in GKE```