可以在 ElasticSearch 中索引 1M docs/sec 吗？

Possible to index 1M docs/sec in ElasticSearch?

我正在尝试优化 ElasticSearch 中的索引速度，因为我们每小时都在不断地重新索引索引，因此我们重新索引数据的速度越快，延迟就越少。

我看到这篇关于达到 100K 的重新索引吞吐量的文章：https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8#.4w3kl9ebf, and this Whosebug question which achieves higher: ElasticSearch - high indexing throughput。

我的问题是是否有可能实现每秒 100 万个文档的持续索引吞吐量，如果可以，如何实现？

这取决于几个因素，但为什么不可能呢？以下是一些可以加快索引编制过程的关键因素：

举个例子，用小文件和一台八核机器，我能够index at about 70k-120k docs/s。投入更多的核心或机器，你可以接近 1M docs/s。

更新：使用 Elasticsearch 6.1.0, on a single 32-core E5, with 64G JVM heap. Here, esbulk 的另一个测试运行可以索引大约 330000 docs/s，使用大小为 20-40 字节的 10M 小文档。

免责声明：我写了 esbulk. The README contains a few measurements - 目前最大值约为 300k docs/s。