当我重新启动我的爬虫时，如何在我到达最后一个深度后恢复爬行？

Question

大家好 我正在制作一个从特定网站抓取大量页面的网络应用程序，我以无限的深度和页面启动了我的 crawler4j 软件，但由于互联网连接突然停止了。现在我想继续抓取该网站，而不是在考虑到我有最后一页深度之前获取我访问过的网址。

Note : I want some way that not to check my stored url with the urls I will fetch because I don't want to send very much requests to this site.

**谢谢**☺

Answer 1

您可以通过启用此功能

来使用 "resumeable" 抓取 crawler4j

crawlConfig.setResumableCrawling(true);

在给定的配置中。请参阅 crawler4j here.

的文档

How to resume crawling after last depth I reached when I restart my crawler?