Elasticsearch 5 在长索引期间停止

Elasticsearch 5 stops during long indexing

我正在使用 ES 5.4.

在单个节点上索引 2B 小文档

文档按 ~3K 索引 组织,总共 2TB。索引的占用率从几 KB 到数百 GB 不等,它们被分片以将每个 分片保持在 5GB 以下 .

我在不同索引上同时使用 14 个线程进行索引,每个 批量 请求 2K 文档 。服务器有 16 个 CPU 和 32GB RAM,在此索引过程中不执行任何读取。

相关ES配置为:

在 50~100M 索引文档之后,ES 开始急剧变慢。

在我读到的日志中:

[2017-05-11T16:44:24,751][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2141] overhead, spent [13.5s] collecting in the last [14.3s]
[2017-05-11T16:44:40,323][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2143][337] duration [14s], collections [1]/[14.5s], total [14s]/[54.2m], memory [11.8gb]->[11.4gb]/[11.8gb], all_pools {[young] [865.3mb]->[533.8mb]/[865.3mb]}{[survivor] [75mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:44:40,323][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2143] overhead, spent [14s] collecting in the last [14.5s]
[2017-05-11T16:44:53,004][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2145][338] duration [10.8s], collections [1]/[11.6s], total [10.8s]/[54.4m], memory [11.8gb]->[11.4gb]/[11.8gb], all_pools {[young] [865.3mb]->[538.9mb]/[865.3mb]}{[survivor] [44.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:44:53,004][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2145] overhead, spent [10.8s] collecting in the last [11.6s]
[2017-05-11T16:45:05,141][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2147][339] duration [11s], collections [1]/[11.1s], total [11s]/[54.6m], memory [11.8gb]->[11.4gb]/[11.8gb], all_pools {[young] [865.3mb]->[558.3mb]/[865.3mb]}{[survivor] [103.1mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:05,142][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2147] overhead, spent [11s] collecting in the last [11.1s]
[2017-05-11T16:45:19,928][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2149][340] duration [13s], collections [1]/[13.7s], total [13s]/[54.8m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[570.1mb]/[865.3mb]}{[survivor] [48.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:19,928][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2149] overhead, spent [13s] collecting in the last [13.7s]
[2017-05-11T16:45:35,926][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2152][341] duration [13.7s], collections [1]/[13.8s], total [13.7s]/[55.1m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[575mb]/[865.3mb]}{[survivor] [104.6mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:35,931][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2152] overhead, spent [13.7s] collecting in the last [13.8s]
[2017-05-11T16:45:49,919][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2154][342] duration [12.8s], collections [1]/[12.9s], total [12.8s]/[55.3m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[577.1mb]/[865.3mb]}{[survivor] [102.3mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:45:49,919][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2154] overhead, spent [12.8s] collecting in the last [12.9s]
[2017-05-11T16:46:03,976][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][old][2156][343] duration [12.1s], collections [1]/[13s], total [12.1s]/[55.5m], memory [11.8gb]->[11.5gb]/[11.8gb], all_pools {[young] [865.3mb]->[601mb]/[865.3mb]}{[survivor] [50.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2017-05-11T16:46:03,976][WARN ][o.e.m.j.JvmGcMonitorService] [VT0Xr1c] [gc][2156] overhead, spent [12.1s] collecting in the last [13s]

查看GC的结束日期,它们与下一次GC的开始日期完全相同。这些 GC 的持续时间超过 10 秒,似乎它们正在阻塞 ES 以防止 Java 内存不足。 一旦 ES 停止 GC,批量请求就会超时(30 秒或更长时间)并且索引过程失败。

使用 ES 2.4 我没有这个问题。我正在使用 ES 5.4,因为我读到它现在可以更好地在索引之间拆分堆。

我做错了什么吗? 为了在整个索引过程中保持高性能,我应该改变什么?

首先,索引太多,分片可以更大:30-50GB甚至更多。

如果 95% 的堆用于索引进程 (indices.memory.index_buffer_size: 95%),您希望 ES 如何存储术语、倒排索引和所有其他数据结构?在哪个内存?你需要给它一些空间。对于 2TB 的数据大小,我不会超过 index_buffer_size: 50%.