尽管退出代码为“255”,但 Kubernetes pod 仍标记为“已完成”
Kubernetes pod marked as `Completed` despite the exit code `255`
情况:
我有一个经常失败的 CronJob(目前这是预期的)。由于执行任务的容器有一个 side-car,容器之间的依赖关系通过 bash 脚本和 emptyDir
在 /etc/liveness
文件夹中的公共挂载来表达:
spec:
containers:
- args:
- -c
- set -x;
...
./process; # execute the main process
rc=$?;
rm /etc/liveness; # clean-up
exit $rc;
command:
- /bin/bash
问题:
在作业失败的情况下,我在日志中看到以下内容:
+ rc=255
+ rm /etc/liveness
+ exit 255
将 retryPolicy
设置为 never
,失败的 pod 进入 Completed
状态,这是误导性的:
scheduler-1594015200-wl9xc 0/2 Completed 0 24m
根据official doc,
A Job creates one or more Pods and ensures that a specified number of
them successfully terminate.
容器在
时进入terminated state
it has successfully completed execution or when it has failed for some
reason.
因此,如果将 retryPolicy 设置为 never,就会发生这种情况。
A Pod's status field is a PodStatus object, which has a phase field.
参考:https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
状态和阶段不一样。所以我了解到,上面发生的事情是我的 pods 最终处于状态 Completed
和阶段 Failed
.
情况:
我有一个经常失败的 CronJob(目前这是预期的)。由于执行任务的容器有一个 side-car,容器之间的依赖关系通过 bash 脚本和 emptyDir
在 /etc/liveness
文件夹中的公共挂载来表达:
spec:
containers:
- args:
- -c
- set -x;
...
./process; # execute the main process
rc=$?;
rm /etc/liveness; # clean-up
exit $rc;
command:
- /bin/bash
问题: 在作业失败的情况下,我在日志中看到以下内容:
+ rc=255
+ rm /etc/liveness
+ exit 255
将 retryPolicy
设置为 never
,失败的 pod 进入 Completed
状态,这是误导性的:
scheduler-1594015200-wl9xc 0/2 Completed 0 24m
根据official doc,
A Job creates one or more Pods and ensures that a specified number of them successfully terminate.
容器在
时进入terminated stateit has successfully completed execution or when it has failed for some reason.
因此,如果将 retryPolicy 设置为 never,就会发生这种情况。
A Pod's status field is a PodStatus object, which has a phase field.
参考:https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
状态和阶段不一样。所以我了解到,上面发生的事情是我的 pods 最终处于状态 Completed
和阶段 Failed
.