尽管退出代码为“255”,但 Kubernetes pod 仍标记为“已完成”

Kubernetes pod marked as `Completed` despite the exit code `255`

情况: 我有一个经常失败的 CronJob(目前这是预期的)。由于执行任务的容器有一个 side-car,容器之间的依赖关系通过 bash 脚本和 emptyDir/etc/liveness 文件夹中的公共挂载来表达:

        spec:
          containers:
          - args:
            - -c
            - set -x;
              ...
              ./process; # execute the main process
              rc=$?;
              rm /etc/liveness; # clean-up
              exit $rc;
            command:
            - /bin/bash

问题: 在作业失败的情况下,我在日志中看到以下内容:

+ rc=255
+ rm /etc/liveness
+ exit 255

retryPolicy 设置为 never,失败的 pod 进入 Completed 状态,这是误导性的:

scheduler-1594015200-wl9xc   0/2     Completed     0          24m

根据official doc

A Job creates one or more Pods and ensures that a specified number of them successfully terminate.

容器在

时进入terminated state

it has successfully completed execution or when it has failed for some reason.

因此,如果将 retryPolicy 设置为 never,就会发生这种情况。

A Pod's status field is a PodStatus object, which has a phase field.

参考:https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

状态和阶段不一样。所以我了解到,上面发生的事情是我的 pods 最终处于状态 Completed 和阶段 Failed.