添加节点到运行集群elasticsearch导致master not discovered异常
Add node to running cluster elasticsearch causes master not discovered exception
问题
我有一个 运行ning 集群,我想在其中添加一个数据节点。 运行ning 集群是
x.x.x.246
并且数据节点是
x.x.x.99
每台服务器都能ping通。
机器OS:美分OS7
弹性搜索:7.61
配置:
这里是 elasticsearch.yml 的 x.x.x.246:
cluster.name: elasticsearch
node.master: true
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.246
http.port: 9200
discovery.seed_hosts: ["x.x.x.99:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
这里是 elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.99
http.port: 9200
discovery.seed_hosts: ["x.x.x.245:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
正在机器上测试 运行ning elasticsearch
当我在每台机器上运行systemctl start elasticsearch
时,效果很好。
在 x.x.x.246
上测试 运行
curl -X GET "X.X.X.246:9200/_cluster/health?pretty"
显示:节点号不变
curl -X GET "X.X.X.99:9200/_cluster/health?pretty
显示:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
已编辑
这里是 elasticsearch.yml 的 x.x.x.246:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99","x.x.x.246]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
这里是 elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: node
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246","x.x.x.99"]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
登录 x.x.x.99:
[root@dev ~]# tail -30 /var/log/elasticsearch/elasticsearch.log
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
[2020-03-19T12:12:04,462][INFO ][o.e.c.c.JoinHelper ] [node-1] failed to join {master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, optionalJoin=Optional[Join{term=178, lastAcceptedTerm=8, lastAcceptedVersion=100, sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, targetNode={master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][10.64.2.246:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
at org.elasticsearch.cluster.coordination.Coordinator.onFailure(Coordinator.java:514) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler.lambda$handleException(InboundHandler.java:244) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][10.64.2.99:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid P4QlwvuRRGSmlT77RroSjA than local cluster uuid oUoIe2-bSbS2UPg722ud9Q, rejecting
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new(JoinHelper.java:148) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
对于节点 x.x.x.99
,种子主机的条目是错误的。它应该如下所示:
discovery.seed_hosts: ["x.x.x.246:9300"]
discovery.seed_hosts
列表用于检测master节点,因为这个列表包含了master合格节点的地址,同时也保存了当前master节点的信息,因为它指向x.x.x.245
而不是x.x.x.99
的配置中的x.x.x.246
,节点x.x.x.99
无法检测到master。
Post评论讨论正确的配置应该是:
主节点:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246]
cluster.initial_master_nodes: ["master"]
请注意,如果您希望上述节点仅作为主节点而不保存数据,则设置
node.data: false
数据节点:
cluster.name: elasticsearch
node.name: data-node-1
node.data: true
node.master: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246"]
此外,由于节点 x.x.x.99
无法加入集群,因此集群状态已过时。因此删除 x.x.x.99
上的 data
文件夹并重新启动此节点。
之所以无法选出主节点,是因为提到了 discovery.seed_hosts: ["x.x.x.245:9300"]
,它不是当前主节点配置的一部分,也不是主节点配置的一部分。正如 this 官方 ES 文档中所述,它用于选举主节点。
您应该详细阅读与 master 选择相关的 2 个重要配置:
您可以打开 Discovery
模块上的 DEBUG 日志记录以更好地理解它,方法是在 elasticsearch.yml
行下方添加
logger.org.elasticsearch.discovery: DEBUG
您可以在 elasticsearch.yml
.
中做一些修改
node.name
在两个节点中具有相同的名称 elasticsearch.yml
。
- 最好只提ip不带端口
9200
.
- 最好给
network.host: 0.0.0.0
值,而不是 elasticsearch.yml
. 中的节点 ip
node.data: true
是默认的,不用多说了。
更好更简洁的版本如下所示:
主节点elasticsearch.yml
cluster.name: elasticsearch
node.name: master
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.seed_hosts: ["x.x.x.99", "x.x.x.246"] -->note this
cluster.initial_master_nodes: ["x.x.x.246"] :- note this
另一个数据节点elasticsearch.yml
cluster.name: elasticsearch
node.name: data
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99", "x.x.x.246"] --> you need to change this and include both nodes
cluster.initial_master_nodes: ["x.x.x.246"]
验证主节点
您可以点击 <your-any-node-ip>:9200/_cat/master
,这应该 return 选出的主节点,在您的案例中节点名称为 master
。有关 this.
的更多信息
我也有同样的问题,当我试图从外部 AWS windows 服务器访问弹性搜索时,我无法访问它,之后我添加了
network.host : aws_private_ip
之后我们需要重新启动弹性服务,但在重新启动时抛出错误,最后在添加下面一行时,它对我有用,
cluster.initial_master_nodes: node-1
问题
我有一个 运行ning 集群,我想在其中添加一个数据节点。 运行ning 集群是
x.x.x.246
并且数据节点是
x.x.x.99
每台服务器都能ping通。 机器OS:美分OS7 弹性搜索:7.61
配置:
这里是 elasticsearch.yml 的 x.x.x.246:
cluster.name: elasticsearch
node.master: true
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.246
http.port: 9200
discovery.seed_hosts: ["x.x.x.99:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
这里是 elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.99
http.port: 9200
discovery.seed_hosts: ["x.x.x.245:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
正在机器上测试 运行ning elasticsearch
当我在每台机器上运行systemctl start elasticsearch
时,效果很好。
在 x.x.x.246
上测试 运行curl -X GET "X.X.X.246:9200/_cluster/health?pretty"
显示:节点号不变
curl -X GET "X.X.X.99:9200/_cluster/health?pretty
显示:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
已编辑
这里是 elasticsearch.yml 的 x.x.x.246:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99","x.x.x.246]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
这里是 elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: node
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246","x.x.x.99"]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
登录 x.x.x.99:
[root@dev ~]# tail -30 /var/log/elasticsearch/elasticsearch.log
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
[2020-03-19T12:12:04,462][INFO ][o.e.c.c.JoinHelper ] [node-1] failed to join {master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, optionalJoin=Optional[Join{term=178, lastAcceptedTerm=8, lastAcceptedVersion=100, sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, targetNode={master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][10.64.2.246:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
at org.elasticsearch.cluster.coordination.Coordinator.onFailure(Coordinator.java:514) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler.lambda$handleException(InboundHandler.java:244) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][10.64.2.99:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid P4QlwvuRRGSmlT77RroSjA than local cluster uuid oUoIe2-bSbS2UPg722ud9Q, rejecting
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new(JoinHelper.java:148) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
对于节点 x.x.x.99
,种子主机的条目是错误的。它应该如下所示:
discovery.seed_hosts: ["x.x.x.246:9300"]
discovery.seed_hosts
列表用于检测master节点,因为这个列表包含了master合格节点的地址,同时也保存了当前master节点的信息,因为它指向x.x.x.245
而不是x.x.x.99
的配置中的x.x.x.246
,节点x.x.x.99
无法检测到master。
Post评论讨论正确的配置应该是:
主节点:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246]
cluster.initial_master_nodes: ["master"]
请注意,如果您希望上述节点仅作为主节点而不保存数据,则设置
node.data: false
数据节点:
cluster.name: elasticsearch
node.name: data-node-1
node.data: true
node.master: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246"]
此外,由于节点 x.x.x.99
无法加入集群,因此集群状态已过时。因此删除 x.x.x.99
上的 data
文件夹并重新启动此节点。
之所以无法选出主节点,是因为提到了 discovery.seed_hosts: ["x.x.x.245:9300"]
,它不是当前主节点配置的一部分,也不是主节点配置的一部分。正如 this 官方 ES 文档中所述,它用于选举主节点。
您应该详细阅读与 master 选择相关的 2 个重要配置:
您可以打开 Discovery
模块上的 DEBUG 日志记录以更好地理解它,方法是在 elasticsearch.yml
logger.org.elasticsearch.discovery: DEBUG
您可以在 elasticsearch.yml
.
node.name
在两个节点中具有相同的名称elasticsearch.yml
。- 最好只提ip不带端口
9200
. - 最好给
network.host: 0.0.0.0
值,而不是elasticsearch.yml
. 中的节点 ip
node.data: true
是默认的,不用多说了。
更好更简洁的版本如下所示:
主节点elasticsearch.yml
cluster.name: elasticsearch
node.name: master
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.seed_hosts: ["x.x.x.99", "x.x.x.246"] -->note this
cluster.initial_master_nodes: ["x.x.x.246"] :- note this
另一个数据节点elasticsearch.yml
cluster.name: elasticsearch
node.name: data
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99", "x.x.x.246"] --> you need to change this and include both nodes
cluster.initial_master_nodes: ["x.x.x.246"]
验证主节点
您可以点击 <your-any-node-ip>:9200/_cat/master
,这应该 return 选出的主节点,在您的案例中节点名称为 master
。有关 this.
我也有同样的问题,当我试图从外部 AWS windows 服务器访问弹性搜索时,我无法访问它,之后我添加了
network.host : aws_private_ip
之后我们需要重新启动弹性服务,但在重新启动时抛出错误,最后在添加下面一行时,它对我有用,
cluster.initial_master_nodes: node-1