Kubernetes api 服务器丢弃来自 pod 的请求导致拨号错误

Kubernetes api server drop requests from a pod causing dial error

我已将 kubernetes 升级到版本 1.1.7,并从我的一个 pod 中收到此错误,该 pod 经常调用 k8s ApiServer 以检查每个其他的活动状态 pods。

Error #01: Get http://[api-server]:8080/api/v1/namespaces/production/pods?labelSelector=app%3Dworkflow-worker-mandrill-hook-handler: dial tcp [api-server]:8080: connect: cannot assign requested address

请求的发送速率约为 80 requests/second。在出现该错误时,我仍然能够从本地调用 API。重启 pod 解决了问题,但第二天又发生了。似乎 api 服务器正在阻止该 pod 以避免 DOS?

我正在使用 docker 版本 Docker version 1.7.1, build 2c2c52b-dirty 和 CoreOS v773.0.0

Linux ***** 4.1.5-coreos #2 SMP Thu Aug 13 09:18:45 UTC 2015 x86_64 Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz GenuineIntel GNU/Linux

Kubernetes api 服务器错误日志:

I0306 07:32:13.087599       1 logs.go:40] http: TLS handshake error from ***:60033: EOF
I0306 07:32:14.596398       1 logs.go:40] http: TLS handshake error from ***:57257: EOF
I0306 07:32:15.126962       1 logs.go:40] http: TLS handshake error from ***:60035: EOF
I0306 07:32:15.136445       1 logs.go:40] http: TLS handshake error from ***:60054: EOF
I0306 07:32:15.210656       1 logs.go:40] http: TLS handshake error from ***:45384: EOF
I0306 07:32:15.215155       1 logs.go:40] http: TLS handshake error from ***:45385: EOF
I0306 07:32:15.253877       1 logs.go:40] http: TLS handshake error from ***:37527: EOF
I0306 07:32:15.265899       1 logs.go:40] http: TLS handshake error from ***:57258: EOF
I0306 07:32:15.272564       1 logs.go:40] http: TLS handshake error from ***:57249: EOF
I0306 07:32:15.282808       1 logs.go:40] http: TLS handshake error from ***:59928: EOF

dmesg 在主节点中:

[Sun Mar  6 07:32:04 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:04 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:04 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:04 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:04 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:15 2016] net_ratelimit: 34 callbacks suppressed
[Sun Mar  6 07:32:15 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:18 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:18 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:18 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:21 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:21 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:21 2016] TCP: too many orphaned sockets
[Sun Mar  6 07:32:29 2016] TCP: too many orphaned sockets

经过 4 小时的调查,原来是因为我的应用程序查询了 k8s api 服务器。它是用 Golang 编写的,使用 "gorequest" 库向 api 服务器调用 REST api。

gorequest 发送后并没有关闭请求,尽管我在代码中明确关闭了它。而且很难检查打开的连接数,因为它 运行 在 Docker 容器中。通常通过 ls /proc/PID/fd | wc -l 命令检查主机就足够了,但这次,我必须访问容器内部才能检查。所以我尝试使用 "http" 库而不是 gorequest,它解决了问题!