Elasticsearch 主节点不断连接和断开连接

Elasticsearch master node constantly connecting and disconnecting

我的日志中不断收到这些错误消息:

[2015-11-10 13:52:03,037][WARN ][discovery.zen.ping.unicast] [ClusterUK Node 1] [11] failed send ping to [ClusterUK Node 1][x-eBYFoiRemOBK7egMHTRg][elasticuk1][inet[/172.24.32.10:9300]]{master=true}
org.elasticsearch.ElasticsearchIllegalStateException: can't add nodes to a stopped transport
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:746)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.run(UnicastZenPing.java:376)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
[2015-11-10 13:52:03,038][WARN ][discovery.zen.ping.unicast] [ClusterUK Node 1] [12] failed send ping to [ClusterUK Node 1][x-eBYFoiRemOBK7egMHTRg][elasticuk1][inet[/172.24.32.10:9300]]{master=true}
org.elasticsearch.ElasticsearchIllegalStateException: can't add nodes to a stopped transport
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:746)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.run(UnicastZenPing.java:376)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
[2015-11-10 13:52:03,038][WARN ][discovery.zen.ping.unicast] [ClusterUK Node 1] [12] failed send ping to [ClusterUK Node 1][x-eBYFoiRemOBK7egMHTRg][elasticuk1][inet[/172.24.32.10:9300]]{master=true}
org.elasticsearch.ElasticsearchIllegalStateException: can't add nodes to a stopped transport
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:746)
    at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.run(UnicastZenPing.java:376)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
[2015-11-10 13:52:11,378][INFO ][transport                ] [ClusterUK Node 1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.24.32.10:9300]}
[2015-11-10 13:52:11,394][INFO ][discovery                ] [ClusterUK Node 1] ClusterUK/FTiLxRmZQLyFtyap8JTj2w
[2015-11-10 13:52:14,498][INFO ][cluster.service          ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:52:14,749][INFO ][http                     ] [ClusterUK Node 1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.24.32.10:9200]}
[2015-11-10 13:52:14,750][INFO ][node                     ] [ClusterUK Node 1] started
[2015-11-10 13:52:44,994][INFO ][discovery.zen            ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [do not exists on master, act as master failure]
[2015-11-10 13:52:44,996][WARN ][discovery.zen            ] [ClusterUK Node 1] master left (reason = do not exists on master, act as master failure), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:52:44,996][INFO ][cluster.service          ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:52:48,047][INFO ][cluster.service          ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:53:10,689][INFO ][cluster.service          ] [ClusterUK Node 1] removed {[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:53:13,199][INFO ][cluster.service          ] [ClusterUK Node 1] added {[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:53:35,963][INFO ][discovery.zen            ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [transport disconnected]
[2015-11-10 13:53:35,964][WARN ][discovery.zen            ] [ClusterUK Node 1] master left (reason = transport disconnected), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:53:35,965][INFO ][cluster.service          ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:53:39,018][INFO ][cluster.service          ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:54:03,581][INFO ][discovery.zen            ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [transport disconnected]
[2015-11-10 13:54:03,581][WARN ][discovery.zen            ] [ClusterUK Node 1] master left (reason = transport disconnected), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:54:03,581][INFO ][cluster.service          ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:54:06,603][INFO ][cluster.service          ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:54:39,790][INFO ][discovery.zen            ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [transport disconnected]
[2015-11-10 13:54:39,792][WARN ][discovery.zen            ] [ClusterUK Node 1] master left (reason = transport disconnected), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:54:39,792][INFO ][cluster.service          ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:54:42,366][ERROR][marvel.agent.exporter    ] [ClusterUK Node 1] remote target didn't respond with 200 OK response code [503 Service Unavailable]. content: [:)
��error�ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]��status$��]

那是我的 elasticsearch.yml 文件:

action.disable_delete_all_indices: true

cluster.name: ClusterUK

network.publish_host: "172.24.32.10"

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["172.24.32.10", "172.24.32.5", "172.24.32.8"]

indices.fielddata.cache.size: 25%
indices.cluster.send_refresh_mapping: false

node.name: "ClusterUK Node 1" 
node.master: true
node.data: true

bootstrap.mlockall: true

在某些情况下,它离开 Elasticsearch 而不是 运行 作为服务(几秒钟)。

目前 运行 在 Rackspace 中,我认为可能涉及网络问题(但是,我绑定到特定 IP 地址并使用单播)。

那里有 4 个节点 运行(3 个具有 master=true 和 data=true 以及一个客户端节点)。

谁能告诉我那里到底发生了什么? Windows 服务器上的版本 1.7.3(客户端节点 1.7.1)。

我怀疑这个问题来自 master left (reason = transport disconnected) 并且它是裂脑,但我该如何解决它?

我找到了问题所在。 Elasticsearch 不容忍 TCP Offloading.

TCP offload engine is a function used in network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. By moving some or all of the processing to dedicated hardware, a TCP offload engine frees the system's main CPU for other tasks. However, TCP offloading has been known to cause some issues, and disabling it can help avoid these issues.

禁用 TCP 卸载

  1. 在Windows服务器中,打开控制面板和select网络 设置 > 更改适配器设置

  1. 右键单击每个适配器(privatepublic),select 网络菜单中配置,然后单击高级选项卡。 列出了 Citrix 适配器的 TCP 卸载设置。

  1. 禁用以下每个 TCP 卸载选项,然后单击 好的:
    • IPv4 校验和卸载
    • 大接收卸载
    • 大发送卸载
    • TCP 校验和卸载

这解决了我的问题。