elasticsearch 服务器状态 503,发现失败

elasticsearch server status 503, discovery fails

我正在构建一个节点集群。两个工作正常(它们被加入到一个集群中),我正在尝试添加第三个(称为 eu5)并且当它启动时,它没有加入集群:

[root@eu5:/etc/elasticsearch]# curl eu5:9200
{
  "status" : 503,
  "name" : "eu5",
  "cluster_name" : "security",
  "version" : {
    "number" : "1.4.2",
    "build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
    "build_timestamp" : "2014-12-16T14:11:12Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.2"
  },
  "tagline" : "You Know, for Search"
}

日志提到发现问题:

[2015-01-09 15:35:23,399][INFO ][node                     ] [eu5] starting ...
[2015-01-09 15:35:23,468][INFO ][transport                ] [eu5] bound_address {inet[/10.81.147.186:9300]}, publish_address {inet[/10.81.147.186:9300]}
[2015-01-09 15:35:23,475][INFO ][discovery                ] [eu5] security/FdjfWCWgT-mQtipLdi9BFA
[2015-01-09 15:35:53,476][WARN ][discovery                ] [eu5] waited for 30s and no initial state was set by the discovery
[2015-01-09 15:35:53,493][INFO ][http                     ] [eu5] bound_address {inet[/10.81.147.186:9200]}, publish_address {inet[/10.81.147.186:9200]}
[2015-01-09 15:35:53,494][INFO ][node                     ] [eu5] started

配置强制单播

cluster.name: security
node.name: eu5
network.host: 10.81.147.186
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast: ["elk.example.com"]

我想加入的提示服务器可用:

[root@eu5:/etc/elasticsearch]# curl elk.example.com:9200
{
  "status" : 200,
  "name" : "eu4",
  "cluster_name" : "security",
  "version" : {
    "number" : "1.4.2",
    "build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
    "build_timestamp" : "2014-12-16T14:11:12Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.2"
  },
  "tagline" : "You Know, for Search"
}

9200 和 9300 端口都可用,来自我要加入的服务器

[root@eu5:/etc/elasticsearch]# nmap -p9200,9300 elk.example.com
(...)
PORT     STATE SERVICE
9200/tcp open  wap-wsp
9300/tcp open  vrace

以及从主服务器到该服务器

[root@eu4:/etc/elasticsearch]#  nmap -p9200,9300 eu5.example.com
(...)
PORT     STATE SERVICE
9200/tcp open  wap-wsp
9300/tcp open  vrace

还有什么我应该检查的吗?

UPDATE:根据 Andrei Stefan 的评论,我切换到 DEBUG 进行记录。我得到诸如

之类的行
[2015-01-12 11:14:41,609][DEBUG][discovery.zen            ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2015-01-12 11:14:44,615][DEBUG][discovery.zen            ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}

在发现阶段(超时后的 30 秒)。快速浏览 the code(虽然我不知道 Java)似乎表明 {none} 意味着 ping 失败。

我上面做的测试表明,从OS的角度,连接是可以的。

UPDATE 2: 下面是对应上面事件的tcpdumpeu5,要加入的机器是10.81.144.186 )

全图:http://i.stack.imgur.com/vLi7r.png

更新 3:我提交了 bug report.

配置有误,应该是

discovery.zen.ping.unicast.hosts

hosts 丢失