Apache Nutch 使用什么数据库来存储 URL？

What database does Apache Nutch use for storing URLs?

nutch

我试图查看它的依赖项（参见 here) but I fail to figure what it uses for storing URLs and handling the progress of the crawl. Judging by the tutorial requirements (see here）它不需要任何第 3 方系统，比如某些 SQL 数据库。

那么它有什么用呢？

感谢任何建议！

Nutch 1.x 将数据存储在 Hadoop MapFiles and SequenceFile 中。 Apache Nutch 是一款基于批处理的爬虫，数据为

对于每个抓取周期中创建和填充的段write-once/read-many之一
或在添加新数据时重写：保存 URL 和状态信息（获取状态和日期、签名/校验和、分数、元数据）的“CrawlDb”

Nutch 2.x（已停用）将所有数据放入单个“网络 table” - 通过 [=11] 将扩展和分发委托给大数据存储（HBase 等） =].

Apache Nutch 使用什么数据库来存储 URL？

What database does Apache Nutch use for storing URLs?

nutch