什么可以触发 k8s 中的 SyncLoop DELETE api 调用?
What could trigger a SyncLoop DELETE api call in k8s?
我在集群中有一个 replicaset
用于 nginx-ingress
运行,有两个实例。两天前,两个容器同时被删除(彼此相隔几毫秒),并且在同一个副本集中创建了两个新实例。我不知道是什么触发了删除。在 kubelet 日志中,我可以看到以下内容:
kubelet[13317]: I0207 22:01:36.843804 13317 kubelet.go:1918] SyncLoop (DELETE, "api"): "nginx-ingress-public-controller-6bf8d59c4c
稍后在日志中列出失败的活性探测:
kubelet[13317]: I0207 22:01:42.596603 13317 prober.go:116] Liveness probe for "nginx-ingress-public-controller-6bf8d59c4c (60c3f9e5-e228-44c8-abd5-b0a4a8507b5c):nginx-ingress-controller" failed (failure): HTTP probe failed with statuscode: 500
理论上这可以解释 pod 删除,但我对顺序感到困惑。此活动探测失败是因为删除命令已经杀死底层 docker 容器,还是这是触发删除的原因?
如果没有完整的日志,很难猜测 nginx
pod 被删除的确切原因。另外,正如您提到的客户环境,可能有很多原因。正如我在评论中所问,可能是 HPA
或 CA
、可抢占节点、临时网络问题等
关于删除 pod 和 Liveness
的第二部分,Liveness
探测失败,因为 nginx
pod 在 deletion
进程中。
其中一个 Kubernetes
默认设置是 grace-period
等于 30 秒。简而言之就是Pod会处于Terminating
状态30秒,过了这个时间就会被移除
测试
如果您想自己验证,可以做一些测试来确认。它需要 kubeadm master 并更改 Verbosity. You can do it by editing the /var/lib/kubelet/kubeadm-flags.env
file (you must have root rights) and add --v=X
where X
is number 0-9
. Details which level shows specific logs can be found here.
- 至少将详细级别设置为
level=5
,我已经在 level=8
上进行了测试
- 部署
Nginx Ingress Controller
- 手动删除
Nginx Ingress Controller
个广告连播
- 使用
$ journalctl -u kubelet
检查日志,您可以使用grep
缩小输出范围并将其保存到文件
($ journalctl -u kubelet | grep ingress-nginx-controller-s2kfr > nginx.log
)
以下是我的测试示例:
#Liveness and Readiness probe works properly:
Feb 24 14:18:35 kubeadm kubelet[11922]: I0224 14:18:35.399156 11922 prober.go:126] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
Feb 24 14:18:40 kubeadm kubelet[11922]: I0224 14:18:40.587129 11922 prober.go:126] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
#Once Deletion process start you can find DELETE api and other information
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.900957 11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901057 11922 kubelet_pods.go:1482] Generating status for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901914 11922 round_trippers.go:422] GET https://10.154.15.225:6443/api/v1/namespaces/ingress-nginx/pods/ingress-nginx-controller-s2kfr
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.909123 11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container controller"
# This entry occurs as default grace-period-time was kept
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.947193 11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
# As Pod was in deletion, Probes failed.
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584208 11922 prober.go:117] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584338 11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Liveness probe failed: HTTP probe failed with statuscode: 500"
Feb 24 14:18:52 kubeadm kubelet[11922]: I0224 14:18:52.045155 11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
Feb 24 14:18:55 kubeadm kubelet[11922]: I0224 14:18:55.398025 11922 prober.go:117] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
在日志中,SyncLoop (DELETE, "api")
和 Liveness
探测之间的时间为 4 秒。在其他情况下,测试时间为几秒(4-7秒差异)。
如果您想执行自己的测试,您可以将 Readiness
和 Liveness
探测检查更改为 1 秒(不是默认设置的 10 秒),您将在同一时间遇到探测问题第二个 Delete api
.
Feb 24 15:09:40 kubeadm kubelet[11922]: I0224 15:09:40.865718 11922 prober.go:126] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" succeeded
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.488819 11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6)"
...
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.865422 11922 prober.go:117] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" failed (failure): HTTP probe failed with statuscode: 500
您可以在 Alibaba docs
中找到对 syncLoop
的详细解释
As indicated in the comments, the syncLoop
function is the major cycle of Kubelet
. This function listens on the updates, obtains the latest Pod
configurations, and synchronizes
the running state and desired state. In this way, all Pods
on the local node can run in the expected states. Actually, syncLoop
only encapsulates syncLoopIteration
, while the synchronization
operation is carried out by syncLoopIteration
.
结论
如果您没有额外的日志记录来保存终止前 pods 的输出,那么在该事件发生一段时间后很难确定根本原因。
在您提供的设置中,Liveness
探测失败,因为 nginx-ingress
pod 已经在终止过程中。 Liveness probe
失败没有触发 pod 删除,但它是删除的结果。
我在集群中有一个 replicaset
用于 nginx-ingress
运行,有两个实例。两天前,两个容器同时被删除(彼此相隔几毫秒),并且在同一个副本集中创建了两个新实例。我不知道是什么触发了删除。在 kubelet 日志中,我可以看到以下内容:
kubelet[13317]: I0207 22:01:36.843804 13317 kubelet.go:1918] SyncLoop (DELETE, "api"): "nginx-ingress-public-controller-6bf8d59c4c
稍后在日志中列出失败的活性探测:
kubelet[13317]: I0207 22:01:42.596603 13317 prober.go:116] Liveness probe for "nginx-ingress-public-controller-6bf8d59c4c (60c3f9e5-e228-44c8-abd5-b0a4a8507b5c):nginx-ingress-controller" failed (failure): HTTP probe failed with statuscode: 500
理论上这可以解释 pod 删除,但我对顺序感到困惑。此活动探测失败是因为删除命令已经杀死底层 docker 容器,还是这是触发删除的原因?
如果没有完整的日志,很难猜测 nginx
pod 被删除的确切原因。另外,正如您提到的客户环境,可能有很多原因。正如我在评论中所问,可能是 HPA
或 CA
、可抢占节点、临时网络问题等
关于删除 pod 和 Liveness
的第二部分,Liveness
探测失败,因为 nginx
pod 在 deletion
进程中。
其中一个 Kubernetes
默认设置是 grace-period
等于 30 秒。简而言之就是Pod会处于Terminating
状态30秒,过了这个时间就会被移除
测试
如果您想自己验证,可以做一些测试来确认。它需要 kubeadm master 并更改 Verbosity. You can do it by editing the /var/lib/kubelet/kubeadm-flags.env
file (you must have root rights) and add --v=X
where X
is number 0-9
. Details which level shows specific logs can be found here.
- 至少将详细级别设置为
level=5
,我已经在level=8
上进行了测试
- 部署
Nginx Ingress Controller
- 手动删除
Nginx Ingress Controller
个广告连播 - 使用
$ journalctl -u kubelet
检查日志,您可以使用grep
缩小输出范围并将其保存到文件 ($ journalctl -u kubelet | grep ingress-nginx-controller-s2kfr > nginx.log
)
以下是我的测试示例:
#Liveness and Readiness probe works properly:
Feb 24 14:18:35 kubeadm kubelet[11922]: I0224 14:18:35.399156 11922 prober.go:126] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
Feb 24 14:18:40 kubeadm kubelet[11922]: I0224 14:18:40.587129 11922 prober.go:126] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" succeeded
#Once Deletion process start you can find DELETE api and other information
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.900957 11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901057 11922 kubelet_pods.go:1482] Generating status for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)"
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.901914 11922 round_trippers.go:422] GET https://10.154.15.225:6443/api/v1/namespaces/ingress-nginx/pods/ingress-nginx-controller-s2kfr
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.909123 11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Normal" reason="Killing" message="Stopping container controller"
# This entry occurs as default grace-period-time was kept
Feb 24 14:18:46 kubeadm kubelet[11922]: I0224 14:18:46.947193 11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
# As Pod was in deletion, Probes failed.
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584208 11922 prober.go:117] Liveness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
Feb 24 14:18:50 kubeadm kubelet[11922]: I0224 14:18:50.584338 11922 event.go:291] "Event occurred" object="ingress-nginx/ingress-nginx-controller-s2kfr" kind="Pod" apiVersion="v1" type="Warning" reason="Unhealthy" message="Liveness probe failed: HTTP probe failed with statuscode: 500"
Feb 24 14:18:52 kubeadm kubelet[11922]: I0224 14:18:52.045155 11922 kubelet_pods.go:952] Pod "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21)" is terminated, but some containers are still running
Feb 24 14:18:55 kubeadm kubelet[11922]: I0224 14:18:55.398025 11922 prober.go:117] Readiness probe for "ingress-nginx-controller-s2kfr_ingress-nginx(9046e404-1b9e-44fd-86f3-5a16ebf27c21):controller" failed (failure): HTTP probe failed with statuscode: 500
在日志中,SyncLoop (DELETE, "api")
和 Liveness
探测之间的时间为 4 秒。在其他情况下,测试时间为几秒(4-7秒差异)。
如果您想执行自己的测试,您可以将 Readiness
和 Liveness
探测检查更改为 1 秒(不是默认设置的 10 秒),您将在同一时间遇到探测问题第二个 Delete api
.
Feb 24 15:09:40 kubeadm kubelet[11922]: I0224 15:09:40.865718 11922 prober.go:126] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" succeeded
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.488819 11922 kubelet.go:1931] SyncLoop (DELETE, "api"): "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6)"
...
Feb 24 15:09:41 kubeadm kubelet[11922]: I0224 15:09:41.865422 11922 prober.go:117] Liveness probe for "ingress-nginx-controller-wwrdw_ingress-nginx(427bc9d6-261e-4427-b034-7abe8cbbfea6):controller" failed (failure): HTTP probe failed with statuscode: 500
您可以在 Alibaba docs
中找到对syncLoop
的详细解释
As indicated in the comments, the
syncLoop
function is the major cycle ofKubelet
. This function listens on the updates, obtains the latestPod
configurations, andsynchronizes
the running state and desired state. In this way, allPods
on the local node can run in the expected states. Actually,syncLoop
only encapsulatessyncLoopIteration
, while thesynchronization
operation is carried out bysyncLoopIteration
.
结论
如果您没有额外的日志记录来保存终止前 pods 的输出,那么在该事件发生一段时间后很难确定根本原因。
在您提供的设置中,Liveness
探测失败,因为 nginx-ingress
pod 已经在终止过程中。 Liveness probe
失败没有触发 pod 删除,但它是删除的结果。