什么可以触发 k8s 中的 SyncLoop DELETE api 调用?

What could trigger a SyncLoop DELETE api call in k8s?

我在集群中有一个 replicaset 用于 nginx-ingress 运行,有两个实例。两天前,两个容器同时被删除(彼此相隔几毫秒),并且在同一个副本集中创建了两个新实例。我不知道是什么触发了删除。在 kubelet 日志中,我可以看到以下内容:

kubelet[13317]: I0207 22:01:36.843804 13317 kubelet.go:1918] SyncLoop (DELETE, "api"): "nginx-ingress-public-controller-6bf8d59c4c

稍后在日志中列出失败的活性探测:

kubelet[13317]: I0207 22:01:42.596603 13317 prober.go:116] Liveness probe for "nginx-ingress-public-controller-6bf8d59c4c (60c3f9e5-e228-44c8-abd5-b0a4a8507b5c):nginx-ingress-controller" failed (failure): HTTP probe failed with statuscode: 500

理论上这可以解释 pod 删除,但我对顺序感到困惑。此活动探测失败是因为删除命令已经杀死底层 docker 容器,还是这是触发删除的原因?

如果没有完整的日志,很难猜测 nginx pod 被删除的确切原因。另外,正如您提到的客户环境,可能有很多原因。正如我在评论中所问,可能是 HPACA、可抢占节点、临时网络问题等

关于删除 pod 和 Liveness 的第二部分,Liveness 探测失败,因为 nginx pod 在 deletion 进程中。

其中一个 Kubernetes 默认设置是 grace-period 等于 30 秒。简而言之就是Pod会处于Terminating状态30秒,过了这个时间就会被移除

测试

如果您想自己验证,可以做一些测试来确认。它需要 kubeadm master 并更改 Verbosity. You can do it by editing the /var/lib/kubelet/kubeadm-flags.env file (you must have root rights) and add --v=X where X is number 0-9. Details which level shows specific logs can be found here.

  • 至少将详细级别设置为 level=5,我已经在 level=8
  • 上进行了测试
  • 部署Nginx Ingress Controller
  • 手动删除 Nginx Ingress Controller 个广告连播
  • 使用$ journalctl -u kubelet检查日志,您可以使用grep缩小输出范围并将其保存到文件 ($ journalctl -u kubelet | grep ingress-nginx-controller-s2kfr > nginx.log)

以下是我的测试示例:

#Liveness and Readiness probe works properly:
Feb 24 14:18:35 kubeadm kubelet[11922]: I0224 14:18:35.399156   11922 prober.go:126] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
Feb 24 14:18:40 kubeadm kubelet[11922]: I0224 14:18:40.587129   11922 prober.go:126] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded

#Once Deletion process start you can find DELETE api and other information

Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.900957   11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901057   11922 kubelet_pods.go:1482] Generating status for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901914   11922 round_trippers.go:422] GET https://10.154.15.225:6443/api/v1/namespaces/ingress-nginx/pods/ingress-nginx-controller-s2kfr
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.909123   11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container controller"

# This entry occurs as default grace-period-time was kept
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.947193   11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running

# As Pod was in deletion, Probes failed.
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584208   11922 prober.go:117] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584338   11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Liveness probe failed: HTTP probe failed with statuscode: 500"
Feb 24 14:18:52 kubeadm kubelet[11922]: I0224 14:18:52.045155   11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
Feb 24 14:18:55 kubeadm kubelet[11922]: I0224 14:18:55.398025   11922 prober.go:117] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500

在日志中,SyncLoop (DELETE, "api")Liveness 探测之间的时间为 4 秒。在其他情况下,测试时间为几秒(4-7秒差异)。

如果您想执行自己的测试,您可以将 ReadinessLiveness 探测检查更改为 1 秒(不是默认设置的 10 秒),您将在同一时间遇到探测问题第二个 Delete api.

Feb 24 15:09:40 kubeadm kubelet[11922]: I0224 15:09:40.865718   11922 prober.go:126] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" succeeded
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.488819   11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6)"
...
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.865422   11922 prober.go:117] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" failed (failure): HTTP probe failed with statuscode: 500

您可以在 Alibaba docs

中找到对 syncLoop 的详细解释

As indicated in the comments, the syncLoop function is the major cycle of Kubelet. This function listens on the updates, obtains the latest Pod configurations, and synchronizes the running state and desired state. In this way, all Pods on the local node can run in the expected states. Actually, syncLoop only encapsulates syncLoopIteration, while the synchronization operation is carried out by syncLoopIteration.

结论

如果您没有额外的日志记录来保存终止前 pods 的输出,那么在该事件发生一段时间后很难确定根本原因。

在您提供的设置中,Liveness 探测失败,因为 nginx-ingress pod 已经在终止过程中。 Liveness probe 失败没有触发 pod 删除,但它是删除的结果。

此外,您还可以查看Kubelet and Prober源代码。