如何知道scrapy-redis完成

Question

当我使用 scrapy-redis 时，它会设置蜘蛛 DontCloseSpider。如何知道scrapy爬行完成。

crawler.signals.connect(ext.spider_closed,signal=signals.spider_closed) 不工作

Answer 1

有意思。

我看到这条评论：

# Max idle time to prevent the spider from being closed when distributed crawling.
# This only works if queue class is SpiderQueue or SpiderStack,
# and may also block the same time when your spider start at the first time (because the queue is empty).
SCHEDULER_IDLE_BEFORE_CLOSE = 10

如果您正确地按照设置说明进行操作但它不起作用，我想至少您必须提供一些允许重现您的设置的数据，例如你的 settings.py 或者你有什么有趣的 spiders/pipelines.

spider_closed 信号确实应该发生。在用完队列中的 URL 几秒钟后。如果队列不为空，蜘蛛将不会关闭 - 显然。

如何知道scrapy-redis完成

How to know scrapy-redis finish

scrapy

scrapy-spider