执行 Elasticsearch Reindex 就地操作时出现服务器错误

Question

我正在使用 AWS Elasticsearch 服务（版本 6.3）。我有兴趣在将数据从 current_index 重新索引到 new_index 时更改映射。我并没有尝试从旧的 Elasticsearch 集群升级到新集群。我的 current_index 和 new_index 都在同一个 Elasticsearch 6.3 集群上。
我正在尝试按照 Elastic documentation 中的信息执行 就地重建索引 操作
我的索引包含大约 25 万个可搜索文档。当我 POST _reindex 使用 curl 请求时，

curl -X POST "aws_elasticsearch_endpoint/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "current_index"
  },
  "dest": {
    "index": "new_index"
  }
}
'

Elasticsearch 启动重建索引过程（我通过执行 GET /_cat/indices?v 来验证这一点），但我最终收到 curl: (56) Unexpected EOF 错误。 Reindex 操作实际上工作正常。大约 2 小时后，new_index 中的 doc.count 与 current_index 中的 doc.count 匹配，status 变为 green

如果我从 Java POST _reindex，我得到这个错误：

java.net.SocketException: Unexpected end of file from server

只有当我的索引中的文档大小很小（我尝试使用 1k 可搜索文档）时，Reindex API returns 才会按照指定的方式成功 here

Answer 1

这是因为响应需要很长时间才能return并且curl超时。在小型数据集上，响应会在超时前返回，这就是您收到响应的原因。

当 curl 超时时，reindex 仍在进行中，您仍然可以使用以下命令查看 reindex 的执行情况：

GET _tasks?actions=*reindex&detailed=true

您还可以添加 ...?wait_for_completion=false to your curl command. ES will create a background task for your reindex operation. The curl command will terminate early and return a taskId that you can then use to regularly check the state of the reindex using the Task API

GET .tasks/task/<taskId>

另请注意，在这种情况下，当任务完成时，您还需要从 .tasks 索引中删除任务，ES 不会为您完成。

Answer 2

AWS Elasticsearch ELB(Elastic Load Balancer) 超时为 60 秒。目前无法配置，这是一项长期存在的功能请求
您可以在 aws forum thread

中找到更多详细信息

因此，任何操作以及在这种特殊情况下重建索引所用时间超过 60 秒都会导致网关超时。
因此，不可能通过增加客户端超时来阻止长运行重建索引。

对于重建索引api，解决方法如上@Val 所建议。那就是使用 wait_for_completion=false 标志和 Reindex API 文档 link 中提到的步骤： https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_url_parameters_3

执行 Elasticsearch Reindex 就地操作时出现服务器错误

Server error while performing Elasticsearch Reindex in place operation

java

amazon-web-services

elasticsearch

reindex