由于 org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor，ElasticSearch BulkShardRequest 失败

Question

我正在将日志从我的反应式 spring 应用程序存储到弹性搜索中。我在弹性搜索中收到以下错误：

Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of processing of [129010665][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logs-dev-2020.11.05][1]] containing [index {[logs-dev-2020.11.05][_doc][0d1478f0-6367-4228-9553-7d16d2993bc2], source[n/a, actual length: [4.1kb], max length: 2kb]}] and a refresh, target allocation id: WwkZtUbPSAapC3C-Jg2z2g, primary term: 1 on EsThreadPoolExecutor[name = 10-110-23-125-common-elasticsearch-apps-dev-v1/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6599247a[Running, pool size = 2, active threads = 2, queued tasks = 221, completed tasks = 689547]]]

我的索引设置：

{
        "logs-dev-2020.11.05": {
        "settings": {
            "index": {
                "highlight": {
                    "max_analyzed_offset": "5000000"
                },
                "number_of_shards": "3",
                "provided_name": "logs-dev-2020.11.05",
                "creation_date": "1604558592095",
                "number_of_replicas": "2",
                "uuid": "wjIOSfZOSLyBFTt1cT-whQ",
                "version": {
                "created": "7020199"
                }
            }
        }
    }
}

我浏览过这个网站：

https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster

我认为调整线程池中的“写入”大小会解决问题，但在网站中提到不推荐如下：

Adjusting the queue sizes is therefore strongly discouraged, as it is like putting a temporary band-aid on the problem rather than actually fixing the underlying issue.

那么我们还能做些什么来改善这种情况呢？

其他信息：

弹性搜索版本 7.2.1
集群运行状况良好，集群中有 3 个节点
每天都会创建索引，每个索引有3个分片

Answer 1

虽然你是对的，但增加 thread_pool 的大小并不是永久的解决方案，你会很高兴知道 elasticsearch 本身增加了写入 thread_pool 的大小（在你的批量请求中使用) 从 200 到 10k，只是一个小版本升级。请参阅 size of 200 in ES 7.8, while 10k of ES 7.9 .

如果您使用的是 ES 7.X 版本，那么您也可以将大小增加到如果不是 10k，那么至少 1k（以避免拒绝请求）。

如果你想要一个正确的修复，你需要做下面的事情

查明它是一致的还是只是一些 short-duration 突发写入请求，同时在一段时间内被清除。
如果是一致的，那么你需要弄清楚是否所有的写优化都到位了，请参考my short-tips to improve index speed。
看看，如果您已达到 data-nodes 的 full-capacity，如果是，请扩展您的集群以处理 increased/legitimate 负载。

由于 org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor，ElasticSearch BulkShardRequest 失败

ElasticSearch BulkShardRequest failed due to org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor

elasticsearch

elasticsearch-performance