为什么 Mesos master 会断开 k8s 框架并关闭错误的文件描述符?

Why does Mesos master disconnect k8s framework and close wrong file descriptor?

我正在尝试关注这个 tutorial 在本地机器上部署“kubernetes on Mesos”:k8s 是最新的 master 分支,Mesos0.26 版本。

在运行Mesosmaster(IP:15.242.100.56)之后,Mesos 奴隶(IP:15.242.100.16),和k8s(IP:15.242.100.60),我可以看到 以下来自 Mesos master 的日志:

I1228 21:56:55.591568 27255 hierarchical.cpp:344] Added slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 (pqsfc016.ftc.rdlabs.hpecorp.net) with cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] (allocated: )
I1228 21:56:55.591601 27240 replica.cpp:700] Replica learned TRUNCATE action at position 4
I1228 21:56:55.593670 27233 master.cpp:4269] Received update of slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 at slave(1)@15.242.100.16:5051 (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
I1228 21:56:55.594622 27239 hierarchical.cpp:400] Slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed resources  (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: )
I1228 21:57:11.060005 27256 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40727 with User-Agent='Go-http-client/1.1'
I1228 21:57:12.053403 27244 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40754 with User-Agent='Go-http-client/1.1'
I1228 21:57:12.084724 27256 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40771 with User-Agent='Go-http-client/1.1'
I1228 21:57:13.130113 27251 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40779 with User-Agent='Go-http-client/1.1'
I1228 21:57:13.136896 27249 master.cpp:2176] Received SUBSCRIBE call for framework 'Kubernetes' at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.137248 27249 master.cpp:2247] Subscribing framework Kubernetes with checkpointing enabled and capabilities [  ]
E1228 21:57:13.138357 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
I1228 21:57:13.138389 27255 hierarchical.cpp:195] Added framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
I1228 21:57:13.138842 27249 master.cpp:1122] Framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 disconnected
I1228 21:57:13.138898 27249 master.cpp:2472] Disconnecting framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.138943 27249 master.cpp:2496] Deactivating framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
E1228 21:57:13.138975 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
I1228 21:57:13.139091 27249 master.cpp:1146] Giving framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 7625.14222623576weeks to failover
I1228 21:57:13.139468 27255 hierarchical.cpp:273] Deactivated framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
W1228 21:57:13.139472 27236 master.cpp:4840] Master returning resources offered to framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 because the framework has terminated or is inactive
I1228 21:57:13.140090 27246 hierarchical.cpp:744] Recovered cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: ) on slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 from framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000

我的问题是:
(1)为什么Mesosmaster会断开k8s框架:

I1228 21:57:13.136896 27249 master.cpp:2176] Received SUBSCRIBE call for framework 'Kubernetes' at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.137248 27249 master.cpp:2247] Subscribing framework Kubernetes with checkpointing enabled and capabilities [  ]
I1228 21:57:13.138389 27255 hierarchical.cpp:195] Added framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
I1228 21:57:13.138842 27249 master.cpp:1122] Framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 disconnected
I1228 21:57:13.138898 27249 master.cpp:2472] Disconnecting framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.138943 27249 master.cpp:2496] Deactivating framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163

(2) 来自 sudo lsof -p 27219 -P -n 命令:

lt-mesos- 27219  nan    0u   CHR  136,2       0t0       5 /dev/pts/2
lt-mesos- 27219  nan    1u   CHR  136,2       0t0       5 /dev/pts/2
lt-mesos- 27219  nan    2u   CHR  136,2       0t0       5 /dev/pts/2
lt-mesos- 27219  nan    3u  0000   0,10         0    8938 anon_inode
lt-mesos- 27219  nan    4u  0000   0,10         0    8938 anon_inode
lt-mesos- 27219  nan    5u  IPv4  85594       0t0     TCP 15.242.100.56:5050 (LISTEN)
lt-mesos- 27219  nan    6w   REG  252,3       360 2099579 /var/lib/mesos/replicated_log/LOG
lt-mesos- 27219  nan    7uW  REG  252,3         0 2099580 /var/lib/mesos/replicated_log/LOCK
lt-mesos- 27219  nan    8u  IPv4 107697       0t0     TCP 15.242.100.56:5050->15.242.100.16:53987 (ESTABLISHED)
lt-mesos- 27219  nan    9u   REG  252,3     65536 2099584 /var/lib/mesos/replicated_log/MANIFEST-000002
lt-mesos- 27219  nan   10u   REG  252,3     65536 2099581 /var/lib/mesos/replicated_log/000004.log
lt-mesos- 27219  nan   11u  IPv4  88952       0t0     TCP 15.242.100.56:59746->15.242.100.16:5051 (ESTABLISHED)
lt-mesos- 27219  nan   12u  IPv4 106756       0t0     TCP 15.242.100.56:5050->15.242.100.60:40727 (ESTABLISHED)
lt-mesos- 27219  nan   13u  IPv4 104980       0t0     TCP 15.242.100.56:5050->15.242.100.60:40754 (ESTABLISHED)
lt-mesos- 27219  nan   14u  IPv4 105876       0t0     TCP 15.242.100.56:5050->15.242.100.60:40771 (ESTABLISHED)
lt-mesos- 27219  nan   15u  IPv4 104981       0t0     TCP 15.242.100.56:5050->15.242.100.60:40779 (ESTABLISHED)
lt-mesos- 27219  nan   16u  IPv4  95212       0t0     TCP 15.242.100.56:5050->15.242.100.60:40780 (ESTABLISHED)

我可以看到 17 没有文件描述符,为什么 Mesos master 试图关闭它:

E1228 21:57:13.138975 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected

已发现问题:删除 k8s 服务器上 iptables 的所有规则:

iptables -F

那就成功了!