Storm-crawler 爬行和索引

Storm-crawler crawl and indexing

我曾使用 Nutch 1x 来抓取网站并使用 Elasticsearch 为数据编制索引。我最近遇到了 Storm-crawler 并喜欢它，尤其是它的流媒体特性。

我是否必须为 Storm-crawler 将数据发送到的 ES 服务器初始化和创建映射？

使用 Nutch，只要我启动了 ES 索引并且运行，映射就会自行处理...除了一些微调。 Stormcrawler 也一样吗？还是我必须先初始化索引和映射？

很高兴听到您喜欢 StormCrawler。

如 README and the video tutorial based on ES2.x, you should use the ES_IndexInit 脚本中所述，明确设置映射。没有它它可能会工作，但它不会是最佳的。