尽管退出代码为“255”，但 Kubernetes pod 仍标记为“已完成”

Question

情况： 我有一个经常失败的 CronJob（目前这是预期的）。由于执行任务的容器有一个 side-car，容器之间的依赖关系通过 bash 脚本和 emptyDir 在 /etc/liveness 文件夹中的公共挂载来表达：

        spec:
          containers:
          - args:
            - -c
            - set -x;
              ...
              ./process; # execute the main process
              rc=$?;
              rm /etc/liveness; # clean-up
              exit $rc;
            command:
            - /bin/bash

问题： 在作业失败的情况下，我在日志中看到以下内容：

+ rc=255
+ rm /etc/liveness
+ exit 255

将 retryPolicy 设置为 never，失败的 pod 进入 Completed 状态，这是误导性的：

scheduler-1594015200-wl9xc   0/2     Completed     0          24m

Answer 1

根据official doc，

A Job creates one or more Pods and ensures that a specified number of them successfully terminate.

容器在

时进入terminated state

it has successfully completed execution or when it has failed for some reason.

因此，如果将 retryPolicy 设置为 never，就会发生这种情况。

Answer 2

A Pod's status field is a PodStatus object, which has a phase field.

参考：https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

状态和阶段不一样。所以我了解到，上面发生的事情是我的 pods 最终处于状态 Completed 和阶段 Failed.

尽管退出代码为“255”，但 Kubernetes pod 仍标记为“已完成”

Kubernetes pod marked as `Completed` despite the exit code `255`

kubernetes

kubernetes-cronjob