Kubernetes cronjob 从不调度,没有报错
Kubernetes cronjob never scheduling, no errors
我在 1.18 上有一个 kubernetes 集群:
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
我正在关注 documentation 的 1.18 cronjobs。我在 hello_world.yaml:
中保存了以下 yaml
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
我用
创建了 cronjob
kubectl create -f hello_world.yaml
cronjob.batch/hello created
然而,尽管创建了 cronjob,但从未安排作业:
kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 <none> 5m48s
kubectl get jobs
NAME COMPLETIONS DURATION AGE
not-my-job-1624413720 1/1 43m 7d11h
not-my-job-1624500120 1/1 42m 6d11h
not-my-job-1624586520 1/1 43m 5d11h
我注意到 运行 的最后一项工作是在 5 天前完成的,当时我们的证书过期导致开发人员收到以下错误:
"Unable to connect to the server: x509: certificate has expired or is not yet valid"
我们使用 IBM 的以下 procedure 重新生成了证书,这在当时似乎有效。这些是主要命令,我们还根据链接文档对配置文件等进行了一些备份:
kubeadm alpha certs renew all
systemctl daemon-reload&&systemctl restart kubelet
我确定证书过期和续订导致了一些问题,但我没有看到确凿的证据。
kubectl describe cronjob hello
Name: hello
Namespace: default
Labels: <none>
Annotations: <none>
Schedule: */1 * * * *
Concurrency Policy: Allow
Suspend: False
Successful Job History Limit: 3
Failed Job History Limit: 1
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Containers:
hello:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/sh
-c
date; echo Hello from the Kubernetes cluster
Environment: <none>
Mounts: <none>
Volumes: <none>
Last Schedule Time: <unset>
Active Jobs: <none>
Events: <none>
如有任何帮助,我们将不胜感激!谢谢。
编辑:提供更多信息:
sudo kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jun 30, 2022 13:31 UTC 364d no
apiserver Jun 30, 2022 13:31 UTC 364d ca no
apiserver-etcd-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
apiserver-kubelet-client Jun 30, 2022 13:31 UTC 364d ca no
controller-manager.conf Jun 30, 2022 13:31 UTC 364d no
etcd-healthcheck-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-peer Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-server Jun 30, 2022 13:31 UTC 364d etcd-ca no
front-proxy-client Jun 30, 2022 13:31 UTC 364d front-proxy-ca no
scheduler.conf Jun 30, 2022 13:31 UTC 364d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jun 23, 2030 13:21 UTC 8y no
etcd-ca Jun 23, 2030 13:21 UTC 8y no
front-proxy-ca Jun 23, 2030 13:21 UTC 8y no
ls -alt /etc/kubernetes/pki/
total 68
-rw-r--r-- 1 root root 1058 Jun 30 13:31 front-proxy-client.crt
-rw------- 1 root root 1679 Jun 30 13:31 front-proxy-client.key
-rw-r--r-- 1 root root 1099 Jun 30 13:31 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1090 Jun 30 13:31 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1229 Jun 30 13:31 apiserver.crt
-rw------- 1 root root 1679 Jun 30 13:31 apiserver.key
drwxr-xr-x 4 root root 4096 Sep 9 2020 ..
drwxr-xr-x 3 root root 4096 Jun 25 2020 .
-rw------- 1 root root 1675 Jun 25 2020 sa.key
-rw------- 1 root root 451 Jun 25 2020 sa.pub
drwxr-xr-x 2 root root 4096 Jun 25 2020 etcd
-rw-r--r-- 1 root root 1038 Jun 25 2020 front-proxy-ca.crt
-rw------- 1 root root 1675 Jun 25 2020 front-proxy-ca.key
-rw-r--r-- 1 root root 1025 Jun 25 2020 ca.crt
-rw------- 1 root root 1679 Jun 25 2020 ca.key
在尝试了很多不同的东西之后找到了解决这个问题的方法,当时忘记更新了。证书在它们已经过期后被更新,我想这停止了集群中不同组件之间的证书同步,并且没有任何东西可以与 API.
通信
这是一个三节点集群。我封锁了工作节点,停止了它们的 kubelet 服务,停止了 docker 容器 + 服务,启动了新的 docker 容器,启动了 kubelet,取消了节点并在主节点上执行了相同的过程。这强制跨不同组件同步证书和密钥。
我在 1.18 上有一个 kubernetes 集群:
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
我正在关注 documentation 的 1.18 cronjobs。我在 hello_world.yaml:
中保存了以下 yamlkind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
我用
创建了 cronjobkubectl create -f hello_world.yaml
cronjob.batch/hello created
然而,尽管创建了 cronjob,但从未安排作业:
kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 <none> 5m48s
kubectl get jobs
NAME COMPLETIONS DURATION AGE
not-my-job-1624413720 1/1 43m 7d11h
not-my-job-1624500120 1/1 42m 6d11h
not-my-job-1624586520 1/1 43m 5d11h
我注意到 运行 的最后一项工作是在 5 天前完成的,当时我们的证书过期导致开发人员收到以下错误:
"Unable to connect to the server: x509: certificate has expired or is not yet valid"
我们使用 IBM 的以下 procedure 重新生成了证书,这在当时似乎有效。这些是主要命令,我们还根据链接文档对配置文件等进行了一些备份:
kubeadm alpha certs renew all
systemctl daemon-reload&&systemctl restart kubelet
我确定证书过期和续订导致了一些问题,但我没有看到确凿的证据。
kubectl describe cronjob hello
Name: hello
Namespace: default
Labels: <none>
Annotations: <none>
Schedule: */1 * * * *
Concurrency Policy: Allow
Suspend: False
Successful Job History Limit: 3
Failed Job History Limit: 1
Starting Deadline Seconds: <unset>
Selector: <unset>
Parallelism: <unset>
Completions: <unset>
Pod Template:
Labels: <none>
Containers:
hello:
Image: busybox
Port: <none>
Host Port: <none>
Args:
/bin/sh
-c
date; echo Hello from the Kubernetes cluster
Environment: <none>
Mounts: <none>
Volumes: <none>
Last Schedule Time: <unset>
Active Jobs: <none>
Events: <none>
如有任何帮助,我们将不胜感激!谢谢。
编辑:提供更多信息:
sudo kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jun 30, 2022 13:31 UTC 364d no
apiserver Jun 30, 2022 13:31 UTC 364d ca no
apiserver-etcd-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
apiserver-kubelet-client Jun 30, 2022 13:31 UTC 364d ca no
controller-manager.conf Jun 30, 2022 13:31 UTC 364d no
etcd-healthcheck-client Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-peer Jun 30, 2022 13:31 UTC 364d etcd-ca no
etcd-server Jun 30, 2022 13:31 UTC 364d etcd-ca no
front-proxy-client Jun 30, 2022 13:31 UTC 364d front-proxy-ca no
scheduler.conf Jun 30, 2022 13:31 UTC 364d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jun 23, 2030 13:21 UTC 8y no
etcd-ca Jun 23, 2030 13:21 UTC 8y no
front-proxy-ca Jun 23, 2030 13:21 UTC 8y no
ls -alt /etc/kubernetes/pki/
total 68
-rw-r--r-- 1 root root 1058 Jun 30 13:31 front-proxy-client.crt
-rw------- 1 root root 1679 Jun 30 13:31 front-proxy-client.key
-rw-r--r-- 1 root root 1099 Jun 30 13:31 apiserver-kubelet-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-kubelet-client.key
-rw-r--r-- 1 root root 1090 Jun 30 13:31 apiserver-etcd-client.crt
-rw------- 1 root root 1675 Jun 30 13:31 apiserver-etcd-client.key
-rw-r--r-- 1 root root 1229 Jun 30 13:31 apiserver.crt
-rw------- 1 root root 1679 Jun 30 13:31 apiserver.key
drwxr-xr-x 4 root root 4096 Sep 9 2020 ..
drwxr-xr-x 3 root root 4096 Jun 25 2020 .
-rw------- 1 root root 1675 Jun 25 2020 sa.key
-rw------- 1 root root 451 Jun 25 2020 sa.pub
drwxr-xr-x 2 root root 4096 Jun 25 2020 etcd
-rw-r--r-- 1 root root 1038 Jun 25 2020 front-proxy-ca.crt
-rw------- 1 root root 1675 Jun 25 2020 front-proxy-ca.key
-rw-r--r-- 1 root root 1025 Jun 25 2020 ca.crt
-rw------- 1 root root 1679 Jun 25 2020 ca.key
在尝试了很多不同的东西之后找到了解决这个问题的方法,当时忘记更新了。证书在它们已经过期后被更新,我想这停止了集群中不同组件之间的证书同步,并且没有任何东西可以与 API.
通信这是一个三节点集群。我封锁了工作节点,停止了它们的 kubelet 服务,停止了 docker 容器 + 服务,启动了新的 docker 容器,启动了 kubelet,取消了节点并在主节点上执行了相同的过程。这强制跨不同组件同步证书和密钥。