Apache Druid 批量摄取 - 索引任务的低性能

Apache Druid batch ingestion - low performance of index task

我阅读了以下教程： http://druid.io/docs/latest/tutorials/tutorial-loading-batch-data.html 并使用 index_task 将数据放入德鲁伊。我还发布了时间边界查询，一切正常。

但是，当我尝试插入大量数据（约 2 000 000 条记录）时，它花费了太多时间。

是否可以提高 index_task 的性能以及如何做？

是否可以通过：

更改段粒度？
用 HadoopDruidIndexer 替换 index_task?
将数据分割成更小的部分并同时插入？
增加节点数或每个节点的内存？
还有什么吗？

请帮忙。

我们有同样的问题，索引任务不是很优化以摄取大量数据。他们在文档中写道："They are however, slow when the data volume exceeds 1G." 最好使用实时摄取 (Tranquility) 或 Index Hadoop Task。如果您需要 batch-ingest 大量数据，Index Hadoop Task 是最佳解决方案。它的扩展性很好，而且速度明显更快。

最近关于德鲁伊的工作对索引任务进行了重大改进。 Index Hadoop task 和 index task 做同样的事情。

Apache Druid 批量摄取 - 索引任务的低性能

Apache Druid batch ingestion - low performance of index task

performance

batch-processing

druid