Kubernetes如何实现自动回滚?
How to achieve Automatic Rollback in Kubernetes?
假设我有一个部署。由于某种原因,它在一段时间后没有响应。有没有办法告诉 Kubernetes 在失败时自动回滚到以前的版本?
你提到过:
I've a deployment. For some reason it's not responding after sometime.
在这种情况下,您可以使用 liveness and readiness 探针:
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
上述探测可能会阻止您部署损坏的版本,但是活性和就绪探测无法将您的部署回滚到以前的版本。 Github上也有类似的issue,但我不确定这件事在不久的将来会有什么进展。
如果你真的想自动化回滚过程,下面我将描述一个你可能会觉得有用的解决方案。
此解决方案需要 Pod 中的 运行ning kubectl
命令。
简而言之,您可以使用脚本来持续监控您的 Deployment,当出现错误时您可以 运行 kubectl rollout undo deployment DEPLOYMENT_NAME
.
首先,您需要决定如何查找失败的部署。例如,我将使用以下命令检查执行更新时间超过 10 秒的部署:
注意:您可以根据需要使用不同的命令。
kubectl rollout status deployment ${deployment} --timeout=10s
要持续监控 default
命名空间中的所有 Deployment,我们可以创建一个 Bash 脚本:
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
我们想要从 Pod 内部 运行 这个脚本,所以我将它转换为 ConfigMap
这将允许我们将这个脚本安装到一个卷中(参见:Using ConfigMaps as files from a Pod) :
$ cat check-script-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
$ kubectl apply -f check-script-configmap.yml
configmap/check-script created
我创建了一个单独的 deployment-checker
ServiceAccount,并分配了 edit
角色,我们的 Pod 将 运行 在此 ServiceAccount 下:
注意:我创建了 Deployment 而不是单个 Pod。
$ cat all-in-one.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: deployment-checker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: deployment-checker-binding
subjects:
- kind: ServiceAccount
name: deployment-checker
namespace: default
roleRef:
kind: ClusterRole
name: edit
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: deployment-checker
name: deployment-checker
spec:
selector:
matchLabels:
app: deployment-checker
template:
metadata:
labels:
app: deployment-checker
spec:
serviceAccountName: deployment-checker
volumes:
- name: check-script
configMap:
name: check-script
containers:
- image: bitnami/kubectl
name: test
command: ["bash", "/mnt/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt
应用上述清单后,deployment-checker
部署已创建并开始监视 default
命名空间中的部署资源:
$ kubectl apply -f all-in-one.yaml
serviceaccount/deployment-checker created
clusterrolebinding.rbac.authorization.k8s.io/deployment-checker-binding created
deployment.apps/deployment-checker created
$ kubectl get deploy,pod | grep "deployment-checker"
deployment.apps/deployment-checker 1/1 1
pod/deployment-checker-69c8896676-pqg9h 1/1 Running
最后,我们可以检查它是如何工作的。我创建了三个部署(app-1
、app-2
、app-3
):
$ kubectl create deploy app-1 --image=nginx
deployment.apps/app-1 created
$ kubectl create deploy app-2 --image=nginx
deployment.apps/app-2 created
$ kubectl create deploy app-3 --image=nginx
deployment.apps/app-3 created
然后我将 app-1
的图像更改为不正确的图像 (nnnginx
):
$ kubectl set image deployment/app-1 nginx=nnnginx
deployment.apps/app-1 image updated
在 deployment-checker
日志中我们可以看到 app-1
已经回滚到以前的版本:
$ kubectl logs -f deployment-checker-69c8896676-pqg9h
...
====== Thu Oct 7 09:20:15 UTC 2021 ======
Ok: app-1
Ok: app-2
Ok: app-3
====== Thu Oct 7 09:21:16 UTC 2021 ======
Error: app-1 - rolling back!
deployment.apps/app-1 rolled back
Ok: app-2
Ok: app-3
假设我有一个部署。由于某种原因,它在一段时间后没有响应。有没有办法告诉 Kubernetes 在失败时自动回滚到以前的版本?
你提到过:
I've a deployment. For some reason it's not responding after sometime.
在这种情况下,您可以使用 liveness and readiness 探针:
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
上述探测可能会阻止您部署损坏的版本,但是活性和就绪探测无法将您的部署回滚到以前的版本。 Github上也有类似的issue,但我不确定这件事在不久的将来会有什么进展。
如果你真的想自动化回滚过程,下面我将描述一个你可能会觉得有用的解决方案。
此解决方案需要 Pod 中的 运行ning kubectl
命令。
简而言之,您可以使用脚本来持续监控您的 Deployment,当出现错误时您可以 运行 kubectl rollout undo deployment DEPLOYMENT_NAME
.
首先,您需要决定如何查找失败的部署。例如,我将使用以下命令检查执行更新时间超过 10 秒的部署:
注意:您可以根据需要使用不同的命令。
kubectl rollout status deployment ${deployment} --timeout=10s
要持续监控 default
命名空间中的所有 Deployment,我们可以创建一个 Bash 脚本:
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
我们想要从 Pod 内部 运行 这个脚本,所以我将它转换为 ConfigMap
这将允许我们将这个脚本安装到一个卷中(参见:Using ConfigMaps as files from a Pod) :
$ cat check-script-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
$ kubectl apply -f check-script-configmap.yml
configmap/check-script created
我创建了一个单独的 deployment-checker
ServiceAccount,并分配了 edit
角色,我们的 Pod 将 运行 在此 ServiceAccount 下:
注意:我创建了 Deployment 而不是单个 Pod。
$ cat all-in-one.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: deployment-checker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: deployment-checker-binding
subjects:
- kind: ServiceAccount
name: deployment-checker
namespace: default
roleRef:
kind: ClusterRole
name: edit
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: deployment-checker
name: deployment-checker
spec:
selector:
matchLabels:
app: deployment-checker
template:
metadata:
labels:
app: deployment-checker
spec:
serviceAccountName: deployment-checker
volumes:
- name: check-script
configMap:
name: check-script
containers:
- image: bitnami/kubectl
name: test
command: ["bash", "/mnt/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt
应用上述清单后,deployment-checker
部署已创建并开始监视 default
命名空间中的部署资源:
$ kubectl apply -f all-in-one.yaml
serviceaccount/deployment-checker created
clusterrolebinding.rbac.authorization.k8s.io/deployment-checker-binding created
deployment.apps/deployment-checker created
$ kubectl get deploy,pod | grep "deployment-checker"
deployment.apps/deployment-checker 1/1 1
pod/deployment-checker-69c8896676-pqg9h 1/1 Running
最后,我们可以检查它是如何工作的。我创建了三个部署(app-1
、app-2
、app-3
):
$ kubectl create deploy app-1 --image=nginx
deployment.apps/app-1 created
$ kubectl create deploy app-2 --image=nginx
deployment.apps/app-2 created
$ kubectl create deploy app-3 --image=nginx
deployment.apps/app-3 created
然后我将 app-1
的图像更改为不正确的图像 (nnnginx
):
$ kubectl set image deployment/app-1 nginx=nnnginx
deployment.apps/app-1 image updated
在 deployment-checker
日志中我们可以看到 app-1
已经回滚到以前的版本:
$ kubectl logs -f deployment-checker-69c8896676-pqg9h
...
====== Thu Oct 7 09:20:15 UTC 2021 ======
Ok: app-1
Ok: app-2
Ok: app-3
====== Thu Oct 7 09:21:16 UTC 2021 ======
Error: app-1 - rolling back!
deployment.apps/app-1 rolled back
Ok: app-2
Ok: app-3