Docker UCP controller down with error: Unhealthy UCP manager: unable to reach manager: connect: connection refused

Docker UCP controller down with error: Unhealthy UCP manager: unable to reach manager: connect: connection refused

我已经使用 docker UCP 设置了一个测试环境,几天后,其中一个控制器随机出现故障,并在 UCP 中显示主机已关闭且集群不健康的消息。

控制器容器的日志:

{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:10Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:19Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:03:56Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:04:15Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:04:32Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:07Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:43Z"}
{"level":"warning","msg":"Kube controller manager health check timed out","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Node health error detected during _ping: Kube controller manager health check timed out","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context deadline exceeded","time":"2018-12-18T10:05:51Z"}
{"level":"warning","msg":"Kube controller manager health check error: unable to inspect container: context

可能是随机网络连接问题?但它应该已经自动恢复了?

检查主机上的 docker 守护程序后,我发现系统遇到了这个问题:

https://github.com/docker/for-linux/issues/162