为什么 Mesos master 会断开 k8s 框架并关闭错误的文件描述符?
Why does Mesos master disconnect k8s framework and close wrong file descriptor?
我正在尝试关注这个 tutorial
在本地机器上部署“kubernetes on Mesos
”:k8s
是最新的
master 分支,Mesos
是 0.26
版本。
在运行Mesos
master(IP:15.242.100.56)之后,Mesos
奴隶(IP:15.242.100.16),和k8s
(IP:15.242.100.60),我可以看到
以下来自 Mesos
master 的日志:
I1228 21:56:55.591568 27255 hierarchical.cpp:344] Added slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 (pqsfc016.ftc.rdlabs.hpecorp.net) with cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] (allocated: )
I1228 21:56:55.591601 27240 replica.cpp:700] Replica learned TRUNCATE action at position 4
I1228 21:56:55.593670 27233 master.cpp:4269] Received update of slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 at slave(1)@15.242.100.16:5051 (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
I1228 21:56:55.594622 27239 hierarchical.cpp:400] Slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed resources (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: )
I1228 21:57:11.060005 27256 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40727 with User-Agent='Go-http-client/1.1'
I1228 21:57:12.053403 27244 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40754 with User-Agent='Go-http-client/1.1'
I1228 21:57:12.084724 27256 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40771 with User-Agent='Go-http-client/1.1'
I1228 21:57:13.130113 27251 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40779 with User-Agent='Go-http-client/1.1'
I1228 21:57:13.136896 27249 master.cpp:2176] Received SUBSCRIBE call for framework 'Kubernetes' at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.137248 27249 master.cpp:2247] Subscribing framework Kubernetes with checkpointing enabled and capabilities [ ]
E1228 21:57:13.138357 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
I1228 21:57:13.138389 27255 hierarchical.cpp:195] Added framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
I1228 21:57:13.138842 27249 master.cpp:1122] Framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 disconnected
I1228 21:57:13.138898 27249 master.cpp:2472] Disconnecting framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.138943 27249 master.cpp:2496] Deactivating framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
E1228 21:57:13.138975 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
I1228 21:57:13.139091 27249 master.cpp:1146] Giving framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 7625.14222623576weeks to failover
I1228 21:57:13.139468 27255 hierarchical.cpp:273] Deactivated framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
W1228 21:57:13.139472 27236 master.cpp:4840] Master returning resources offered to framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 because the framework has terminated or is inactive
I1228 21:57:13.140090 27246 hierarchical.cpp:744] Recovered cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: ) on slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 from framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
我的问题是:
(1)为什么Mesos
master会断开k8s
框架:
I1228 21:57:13.136896 27249 master.cpp:2176] Received SUBSCRIBE call for framework 'Kubernetes' at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.137248 27249 master.cpp:2247] Subscribing framework Kubernetes with checkpointing enabled and capabilities [ ]
I1228 21:57:13.138389 27255 hierarchical.cpp:195] Added framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
I1228 21:57:13.138842 27249 master.cpp:1122] Framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 disconnected
I1228 21:57:13.138898 27249 master.cpp:2472] Disconnecting framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.138943 27249 master.cpp:2496] Deactivating framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
(2) 来自 sudo lsof -p 27219 -P -n
命令:
lt-mesos- 27219 nan 0u CHR 136,2 0t0 5 /dev/pts/2
lt-mesos- 27219 nan 1u CHR 136,2 0t0 5 /dev/pts/2
lt-mesos- 27219 nan 2u CHR 136,2 0t0 5 /dev/pts/2
lt-mesos- 27219 nan 3u 0000 0,10 0 8938 anon_inode
lt-mesos- 27219 nan 4u 0000 0,10 0 8938 anon_inode
lt-mesos- 27219 nan 5u IPv4 85594 0t0 TCP 15.242.100.56:5050 (LISTEN)
lt-mesos- 27219 nan 6w REG 252,3 360 2099579 /var/lib/mesos/replicated_log/LOG
lt-mesos- 27219 nan 7uW REG 252,3 0 2099580 /var/lib/mesos/replicated_log/LOCK
lt-mesos- 27219 nan 8u IPv4 107697 0t0 TCP 15.242.100.56:5050->15.242.100.16:53987 (ESTABLISHED)
lt-mesos- 27219 nan 9u REG 252,3 65536 2099584 /var/lib/mesos/replicated_log/MANIFEST-000002
lt-mesos- 27219 nan 10u REG 252,3 65536 2099581 /var/lib/mesos/replicated_log/000004.log
lt-mesos- 27219 nan 11u IPv4 88952 0t0 TCP 15.242.100.56:59746->15.242.100.16:5051 (ESTABLISHED)
lt-mesos- 27219 nan 12u IPv4 106756 0t0 TCP 15.242.100.56:5050->15.242.100.60:40727 (ESTABLISHED)
lt-mesos- 27219 nan 13u IPv4 104980 0t0 TCP 15.242.100.56:5050->15.242.100.60:40754 (ESTABLISHED)
lt-mesos- 27219 nan 14u IPv4 105876 0t0 TCP 15.242.100.56:5050->15.242.100.60:40771 (ESTABLISHED)
lt-mesos- 27219 nan 15u IPv4 104981 0t0 TCP 15.242.100.56:5050->15.242.100.60:40779 (ESTABLISHED)
lt-mesos- 27219 nan 16u IPv4 95212 0t0 TCP 15.242.100.56:5050->15.242.100.60:40780 (ESTABLISHED)
我可以看到 17
没有文件描述符,为什么 Mesos
master 试图关闭它:
E1228 21:57:13.138975 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
已发现问题:删除 k8s
服务器上 iptables
的所有规则:
iptables -F
那就成功了!
我正在尝试关注这个 tutorial
在本地机器上部署“kubernetes on Mesos
”:k8s
是最新的
master 分支,Mesos
是 0.26
版本。
在运行Mesos
master(IP:15.242.100.56)之后,Mesos
奴隶(IP:15.242.100.16),和k8s
(IP:15.242.100.60),我可以看到
以下来自 Mesos
master 的日志:
I1228 21:56:55.591568 27255 hierarchical.cpp:344] Added slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 (pqsfc016.ftc.rdlabs.hpecorp.net) with cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] (allocated: )
I1228 21:56:55.591601 27240 replica.cpp:700] Replica learned TRUNCATE action at position 4
I1228 21:56:55.593670 27233 master.cpp:4269] Received update of slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 at slave(1)@15.242.100.16:5051 (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
I1228 21:56:55.594622 27239 hierarchical.cpp:400] Slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed resources (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: )
I1228 21:57:11.060005 27256 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40727 with User-Agent='Go-http-client/1.1'
I1228 21:57:12.053403 27244 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40754 with User-Agent='Go-http-client/1.1'
I1228 21:57:12.084724 27256 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40771 with User-Agent='Go-http-client/1.1'
I1228 21:57:13.130113 27251 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:40779 with User-Agent='Go-http-client/1.1'
I1228 21:57:13.136896 27249 master.cpp:2176] Received SUBSCRIBE call for framework 'Kubernetes' at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.137248 27249 master.cpp:2247] Subscribing framework Kubernetes with checkpointing enabled and capabilities [ ]
E1228 21:57:13.138357 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
I1228 21:57:13.138389 27255 hierarchical.cpp:195] Added framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
I1228 21:57:13.138842 27249 master.cpp:1122] Framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 disconnected
I1228 21:57:13.138898 27249 master.cpp:2472] Disconnecting framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.138943 27249 master.cpp:2496] Deactivating framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
E1228 21:57:13.138975 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
I1228 21:57:13.139091 27249 master.cpp:1146] Giving framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 7625.14222623576weeks to failover
I1228 21:57:13.139468 27255 hierarchical.cpp:273] Deactivated framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
W1228 21:57:13.139472 27236 master.cpp:4840] Master returning resources offered to framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 because the framework has terminated or is inactive
I1228 21:57:13.140090 27246 hierarchical.cpp:744] Recovered cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: ) on slave 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-S0 from framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
我的问题是:
(1)为什么Mesos
master会断开k8s
框架:
I1228 21:57:13.136896 27249 master.cpp:2176] Received SUBSCRIBE call for framework 'Kubernetes' at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.137248 27249 master.cpp:2247] Subscribing framework Kubernetes with checkpointing enabled and capabilities [ ]
I1228 21:57:13.138389 27255 hierarchical.cpp:195] Added framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000
I1228 21:57:13.138842 27249 master.cpp:1122] Framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163 disconnected
I1228 21:57:13.138898 27249 master.cpp:2472] Disconnecting framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
I1228 21:57:13.138943 27249 master.cpp:2496] Deactivating framework 5de231c9-993c-4ac7-8ffb-c3fbff2c61cd-0000 (Kubernetes) at scheduler(1)@15.242.100.60:49163
(2) 来自 sudo lsof -p 27219 -P -n
命令:
lt-mesos- 27219 nan 0u CHR 136,2 0t0 5 /dev/pts/2
lt-mesos- 27219 nan 1u CHR 136,2 0t0 5 /dev/pts/2
lt-mesos- 27219 nan 2u CHR 136,2 0t0 5 /dev/pts/2
lt-mesos- 27219 nan 3u 0000 0,10 0 8938 anon_inode
lt-mesos- 27219 nan 4u 0000 0,10 0 8938 anon_inode
lt-mesos- 27219 nan 5u IPv4 85594 0t0 TCP 15.242.100.56:5050 (LISTEN)
lt-mesos- 27219 nan 6w REG 252,3 360 2099579 /var/lib/mesos/replicated_log/LOG
lt-mesos- 27219 nan 7uW REG 252,3 0 2099580 /var/lib/mesos/replicated_log/LOCK
lt-mesos- 27219 nan 8u IPv4 107697 0t0 TCP 15.242.100.56:5050->15.242.100.16:53987 (ESTABLISHED)
lt-mesos- 27219 nan 9u REG 252,3 65536 2099584 /var/lib/mesos/replicated_log/MANIFEST-000002
lt-mesos- 27219 nan 10u REG 252,3 65536 2099581 /var/lib/mesos/replicated_log/000004.log
lt-mesos- 27219 nan 11u IPv4 88952 0t0 TCP 15.242.100.56:59746->15.242.100.16:5051 (ESTABLISHED)
lt-mesos- 27219 nan 12u IPv4 106756 0t0 TCP 15.242.100.56:5050->15.242.100.60:40727 (ESTABLISHED)
lt-mesos- 27219 nan 13u IPv4 104980 0t0 TCP 15.242.100.56:5050->15.242.100.60:40754 (ESTABLISHED)
lt-mesos- 27219 nan 14u IPv4 105876 0t0 TCP 15.242.100.56:5050->15.242.100.60:40771 (ESTABLISHED)
lt-mesos- 27219 nan 15u IPv4 104981 0t0 TCP 15.242.100.56:5050->15.242.100.60:40779 (ESTABLISHED)
lt-mesos- 27219 nan 16u IPv4 95212 0t0 TCP 15.242.100.56:5050->15.242.100.60:40780 (ESTABLISHED)
我可以看到 17
没有文件描述符,为什么 Mesos
master 试图关闭它:
E1228 21:57:13.138975 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
已发现问题:删除 k8s
服务器上 iptables
的所有规则:
iptables -F
那就成功了!