elasticsearch 服务器状态 503,发现失败
elasticsearch server status 503, discovery fails
我正在构建一个节点集群。两个工作正常(它们被加入到一个集群中),我正在尝试添加第三个(称为 eu5
)并且当它启动时,它没有加入集群:
[root@eu5:/etc/elasticsearch]# curl eu5:9200
{
"status" : 503,
"name" : "eu5",
"cluster_name" : "security",
"version" : {
"number" : "1.4.2",
"build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
"build_timestamp" : "2014-12-16T14:11:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
日志提到发现问题:
[2015-01-09 15:35:23,399][INFO ][node ] [eu5] starting ...
[2015-01-09 15:35:23,468][INFO ][transport ] [eu5] bound_address {inet[/10.81.147.186:9300]}, publish_address {inet[/10.81.147.186:9300]}
[2015-01-09 15:35:23,475][INFO ][discovery ] [eu5] security/FdjfWCWgT-mQtipLdi9BFA
[2015-01-09 15:35:53,476][WARN ][discovery ] [eu5] waited for 30s and no initial state was set by the discovery
[2015-01-09 15:35:53,493][INFO ][http ] [eu5] bound_address {inet[/10.81.147.186:9200]}, publish_address {inet[/10.81.147.186:9200]}
[2015-01-09 15:35:53,494][INFO ][node ] [eu5] started
配置强制单播
cluster.name: security
node.name: eu5
network.host: 10.81.147.186
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast: ["elk.example.com"]
我想加入的提示服务器可用:
[root@eu5:/etc/elasticsearch]# curl elk.example.com:9200
{
"status" : 200,
"name" : "eu4",
"cluster_name" : "security",
"version" : {
"number" : "1.4.2",
"build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
"build_timestamp" : "2014-12-16T14:11:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
9200 和 9300 端口都可用,来自我要加入的服务器
[root@eu5:/etc/elasticsearch]# nmap -p9200,9300 elk.example.com
(...)
PORT STATE SERVICE
9200/tcp open wap-wsp
9300/tcp open vrace
以及从主服务器到该服务器
[root@eu4:/etc/elasticsearch]# nmap -p9200,9300 eu5.example.com
(...)
PORT STATE SERVICE
9200/tcp open wap-wsp
9300/tcp open vrace
还有什么我应该检查的吗?
UPDATE:根据 Andrei Stefan 的评论,我切换到 DEBUG
进行记录。我得到诸如
之类的行
[2015-01-12 11:14:41,609][DEBUG][discovery.zen ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2015-01-12 11:14:44,615][DEBUG][discovery.zen ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}
在发现阶段(超时后的 30 秒)。快速浏览 the code(虽然我不知道 Java)似乎表明 {none}
意味着 ping 失败。
我上面做的测试表明,从OS的角度,连接是可以的。
UPDATE 2: 下面是对应上面事件的tcpdump
(eu5
,要加入的机器是10.81.144.186
)
全图:http://i.stack.imgur.com/vLi7r.png
更新 3:我提交了 bug report.
配置有误,应该是
discovery.zen.ping.unicast.hosts
hosts
丢失
我正在构建一个节点集群。两个工作正常(它们被加入到一个集群中),我正在尝试添加第三个(称为 eu5
)并且当它启动时,它没有加入集群:
[root@eu5:/etc/elasticsearch]# curl eu5:9200
{
"status" : 503,
"name" : "eu5",
"cluster_name" : "security",
"version" : {
"number" : "1.4.2",
"build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
"build_timestamp" : "2014-12-16T14:11:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
日志提到发现问题:
[2015-01-09 15:35:23,399][INFO ][node ] [eu5] starting ...
[2015-01-09 15:35:23,468][INFO ][transport ] [eu5] bound_address {inet[/10.81.147.186:9300]}, publish_address {inet[/10.81.147.186:9300]}
[2015-01-09 15:35:23,475][INFO ][discovery ] [eu5] security/FdjfWCWgT-mQtipLdi9BFA
[2015-01-09 15:35:53,476][WARN ][discovery ] [eu5] waited for 30s and no initial state was set by the discovery
[2015-01-09 15:35:53,493][INFO ][http ] [eu5] bound_address {inet[/10.81.147.186:9200]}, publish_address {inet[/10.81.147.186:9200]}
[2015-01-09 15:35:53,494][INFO ][node ] [eu5] started
配置强制单播
cluster.name: security
node.name: eu5
network.host: 10.81.147.186
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast: ["elk.example.com"]
我想加入的提示服务器可用:
[root@eu5:/etc/elasticsearch]# curl elk.example.com:9200
{
"status" : 200,
"name" : "eu4",
"cluster_name" : "security",
"version" : {
"number" : "1.4.2",
"build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c",
"build_timestamp" : "2014-12-16T14:11:12Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
9200 和 9300 端口都可用,来自我要加入的服务器
[root@eu5:/etc/elasticsearch]# nmap -p9200,9300 elk.example.com
(...)
PORT STATE SERVICE
9200/tcp open wap-wsp
9300/tcp open vrace
以及从主服务器到该服务器
[root@eu4:/etc/elasticsearch]# nmap -p9200,9300 eu5.example.com
(...)
PORT STATE SERVICE
9200/tcp open wap-wsp
9300/tcp open vrace
还有什么我应该检查的吗?
UPDATE:根据 Andrei Stefan 的评论,我切换到 DEBUG
进行记录。我得到诸如
[2015-01-12 11:14:41,609][DEBUG][discovery.zen ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2015-01-12 11:14:44,615][DEBUG][discovery.zen ] [eu5] filtered ping responses: (filter_client[true], filter_data[false]) {none}
在发现阶段(超时后的 30 秒)。快速浏览 the code(虽然我不知道 Java)似乎表明 {none}
意味着 ping 失败。
我上面做的测试表明,从OS的角度,连接是可以的。
UPDATE 2: 下面是对应上面事件的tcpdump
(eu5
,要加入的机器是10.81.144.186
)
全图:http://i.stack.imgur.com/vLi7r.png
更新 3:我提交了 bug report.
配置有误,应该是
discovery.zen.ping.unicast.hosts
hosts
丢失