Elasticsearch 7.2 集群会议未分配分片
Elasticsearch 7.2 cluster meeting unassigned shards
我想用7.2版本搭建一个三节点的Elasticsearch集群,但是有点意外
我有三个虚拟机:192.168.7.2、192.168.7.3、192.168.7.4,它们的主要配置在config/elasticsearch.yml
:
- 192.168.7.2:
cluster.name: ucas
node.name: node-2
network.host: 192.168.7.2
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
http.cors.enabled: true
http.cors.allow-origin: "*"
- 192.168.7.3:
cluster.name: ucas
node.name: node-3
network.host: 192.168.7.3
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
- 192.168.7.4:
cluster.name: ucas
node.name: node-4
network.host: 192.168.7.4
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
当我启动每个节点时,创建一个名为 movie 的索引,其中包含 3 个分片和 0 个副本,然后将一些文档写入索引,集群看起来正常:
PUT moive
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
}
}
PUT moive/_doc/3
{
"title":"title 3"
}
然后,将 movie
副本设置为 1:
PUT moive/_settings
{
"number_of_replicas": 1
}
一切顺利,但是当我将 movie
副本设置为 2 时:
PUT moive/_settings
{
"number_of_replicas": 2
}
无法将新副本分配给 node2。
不知道哪一步不对,请大家帮忙讨论一下
首先使用explain命令查找无法分配分片的原因:
GET _cluster/allocation/explain?pretty
{
"index" : "moive",
"shard" : 2,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2019-07-19T06:47:29.704Z",
"details" : "node_left [tIm8GrisRya8jl_n9lc3MQ]",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "kQ0Noq8LSpyEcVDF1POfJw",
"node_name" : "node-3",
"transport_address" : "192.168.7.3:9300",
"node_attributes" : {
"ml.machine_memory" : "5033172992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"store" : {
"matching_sync_id" : true
},
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[kQ0Noq8LSpyEcVDF1POfJw], [R], s[STARTED], a[id=Ul73SPyaTSyGah7Yl3k2zA]]"
}
]
},
{
"node_id" : "mNpqD9WPRrKsyntk2GKHMQ",
"node_name" : "node-4",
"transport_address" : "192.168.7.4:9300",
"node_attributes" : {
"ml.machine_memory" : "5033172992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"store" : {
"matching_sync_id" : true
},
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[mNpqD9WPRrKsyntk2GKHMQ], [P], s[STARTED], a[id=yQo1HUqoSdecD-SZyYMYfg]]"
}
]
},
{
"node_id" : "tIm8GrisRya8jl_n9lc3MQ",
"node_name" : "node-2",
"transport_address" : "192.168.7.2:9300",
"node_attributes" : {
"ml.machine_memory" : "5033172992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "disk_threshold",
"decision" : "NO",
"explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [2.2790256709451573E-4%]"
}
]
}
]
}
我们可以看到node-2的磁盘space已满:
[vagrant@node2 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 8.4G 8.0G 480M 95% /
devtmpfs 2.4G 0 2.4G 0% /dev
tmpfs 2.4G 0 2.4G 0% /dev/shm
tmpfs 2.4G 8.4M 2.4G 1% /run
tmpfs 2.4G 0 2.4G 0% /sys/fs/cgroup
/dev/sda1 497M 118M 379M 24% /boot
none 234G 149G 86G 64% /vagrant
然后我清理了磁盘 space 一切都恢复正常了:
我想用7.2版本搭建一个三节点的Elasticsearch集群,但是有点意外
我有三个虚拟机:192.168.7.2、192.168.7.3、192.168.7.4,它们的主要配置在config/elasticsearch.yml
:
- 192.168.7.2:
cluster.name: ucas
node.name: node-2
network.host: 192.168.7.2
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
http.cors.enabled: true
http.cors.allow-origin: "*"
- 192.168.7.3:
cluster.name: ucas
node.name: node-3
network.host: 192.168.7.3
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
- 192.168.7.4:
cluster.name: ucas
node.name: node-4
network.host: 192.168.7.4
http.port: 9200
discovery.seed_hosts: ["192.168.7.2", "192.168.7.3", "192.168.7.4"]
cluster.initial_master_nodes: ["node-2", "node-3", "node-4"]
当我启动每个节点时,创建一个名为 movie 的索引,其中包含 3 个分片和 0 个副本,然后将一些文档写入索引,集群看起来正常:
PUT moive
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0
}
}
PUT moive/_doc/3
{
"title":"title 3"
}
然后,将 movie
副本设置为 1:
PUT moive/_settings
{
"number_of_replicas": 1
}
一切顺利,但是当我将 movie
副本设置为 2 时:
PUT moive/_settings
{
"number_of_replicas": 2
}
无法将新副本分配给 node2。
不知道哪一步不对,请大家帮忙讨论一下
首先使用explain命令查找无法分配分片的原因:
GET _cluster/allocation/explain?pretty
{
"index" : "moive",
"shard" : 2,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2019-07-19T06:47:29.704Z",
"details" : "node_left [tIm8GrisRya8jl_n9lc3MQ]",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "kQ0Noq8LSpyEcVDF1POfJw",
"node_name" : "node-3",
"transport_address" : "192.168.7.3:9300",
"node_attributes" : {
"ml.machine_memory" : "5033172992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"store" : {
"matching_sync_id" : true
},
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[kQ0Noq8LSpyEcVDF1POfJw], [R], s[STARTED], a[id=Ul73SPyaTSyGah7Yl3k2zA]]"
}
]
},
{
"node_id" : "mNpqD9WPRrKsyntk2GKHMQ",
"node_name" : "node-4",
"transport_address" : "192.168.7.4:9300",
"node_attributes" : {
"ml.machine_memory" : "5033172992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"store" : {
"matching_sync_id" : true
},
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[moive][2], node[mNpqD9WPRrKsyntk2GKHMQ], [P], s[STARTED], a[id=yQo1HUqoSdecD-SZyYMYfg]]"
}
]
},
{
"node_id" : "tIm8GrisRya8jl_n9lc3MQ",
"node_name" : "node-2",
"transport_address" : "192.168.7.2:9300",
"node_attributes" : {
"ml.machine_memory" : "5033172992",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "disk_threshold",
"decision" : "NO",
"explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [2.2790256709451573E-4%]"
}
]
}
]
}
我们可以看到node-2的磁盘space已满:
[vagrant@node2 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 8.4G 8.0G 480M 95% /
devtmpfs 2.4G 0 2.4G 0% /dev
tmpfs 2.4G 0 2.4G 0% /dev/shm
tmpfs 2.4G 8.4M 2.4G 1% /run
tmpfs 2.4G 0 2.4G 0% /sys/fs/cgroup
/dev/sda1 497M 118M 379M 24% /boot
none 234G 149G 86G 64% /vagrant
然后我清理了磁盘 space 一切都恢复正常了: