Kubernetes cronjob 从不调度,没有报错

Kubernetes cronjob never scheduling, no errors

我在 1.18 上有一个 kubernetes 集群:

Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

我正在关注 documentation 的 1.18 cronjobs。我在 hello_world.yaml:

中保存了以下 yaml
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

我用

创建了 cronjob
kubectl create -f hello_world.yaml
cronjob.batch/hello created

然而,尽管创建了 cronjob,但从未安排作业:

kubectl get cronjobs
NAME                               SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello                              */1 * * * *   False     0        <none>          5m48s



kubectl get jobs
NAME                                       COMPLETIONS   DURATION   AGE
not-my-job-1624413720                      1/1           43m        7d11h
not-my-job-1624500120                      1/1           42m        6d11h
not-my-job-1624586520                      1/1           43m        5d11h

我注意到 运行 的最后一项工作是在 5 天前完成的,当时我们的证书过期导致开发人员收到以下错误:

"Unable to connect to the server: x509: certificate has expired or is not yet valid"

我们使用 IBM 的以下 procedure 重新生成了证书,这在当时似乎有效。这些是主要命令,我们还根据链接文档对配置文件等进行了一些备份:

kubeadm alpha certs renew all

systemctl daemon-reload&&systemctl restart kubelet

我确定证书过期和续订导致了一些问题,但我没有看到确凿的证据。

kubectl describe cronjob hello
Name:                          hello
Namespace:                     default
Labels:                        <none>
Annotations:                   <none>
Schedule:                      */1 * * * *
Concurrency Policy:            Allow
Suspend:                       False
Successful Job History Limit:  3
Failed Job History Limit:      1
Starting Deadline Seconds:     <unset>
Selector:                      <unset>
Parallelism:                   <unset>
Completions:                   <unset>
Pod Template:
  Labels:  <none>
  Containers:
   hello:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Args:
      /bin/sh
      -c
      date; echo Hello from the Kubernetes cluster
    Environment:     <none>
    Mounts:          <none>
  Volumes:           <none>
Last Schedule Time:  <unset>
Active Jobs:         <none>
Events:              <none>

如有任何帮助,我们将不胜感激!谢谢。

编辑:提供更多信息:

sudo kubeadm alpha certs check-expiration

[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jun 30, 2022 13:31 UTC   364d                                    no
apiserver                  Jun 30, 2022 13:31 UTC   364d            ca                      no
apiserver-etcd-client      Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Jun 30, 2022 13:31 UTC   364d            ca                      no
controller-manager.conf    Jun 30, 2022 13:31 UTC   364d                                    no
etcd-healthcheck-client    Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
etcd-peer                  Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
etcd-server                Jun 30, 2022 13:31 UTC   364d            etcd-ca                 no
front-proxy-client         Jun 30, 2022 13:31 UTC   364d            front-proxy-ca          no
scheduler.conf             Jun 30, 2022 13:31 UTC   364d                                    no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jun 23, 2030 13:21 UTC   8y              no
etcd-ca                 Jun 23, 2030 13:21 UTC   8y              no
front-proxy-ca          Jun 23, 2030 13:21 UTC   8y              no
ls -alt /etc/kubernetes/pki/
total 68
-rw-r--r-- 1 root root 1058 Jun 30 13:31 front-proxy-client.crt
-rw------- 1 root root 1679 Jun 30 13:31 front-proxy-client.key
-rw-r--r-- 1 root root 1099 Jun 30 13:31 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1090 Jun 30 13:31 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1229 Jun 30 13:31 apiserver.crt
-rw------- 1 root root 1679 Jun 30 13:31 apiserver.key
drwxr-xr-x 4 root root 4096 Sep  9  2020 ..
drwxr-xr-x 3 root root 4096 Jun 25  2020 .
-rw------- 1 root root 1675 Jun 25  2020 sa.key
-rw------- 1 root root  451 Jun 25  2020 sa.pub
drwxr-xr-x 2 root root 4096 Jun 25  2020 etcd
-rw-r--r-- 1 root root 1038 Jun 25  2020 front-proxy-ca.crt
-rw------- 1 root root 1675 Jun 25  2020 front-proxy-ca.key
-rw-r--r-- 1 root root 1025 Jun 25  2020 ca.crt
-rw------- 1 root root 1679 Jun 25  2020 ca.key

在尝试了很多不同的东西之后找到了解决这个问题的方法,当时忘记更新了。证书在它们已经过期后被更新,我想这停止了集群中不同组件之间的证书同步,并且没有任何东西可以与 API.

通信

这是一个三节点集群。我封锁了工作节点,停止了它们的 kubelet 服务,停止了 docker 容器 + 服务,启动了新的 docker 容器,启动了 kubelet,取消了节点并在主节点上执行了相同的过程。这强制跨不同组件同步证书和密钥。