蜘蛛不去下一页

Question

蜘蛛代码：

import scrapy
from crawler.items import Item

class DmozSpider(scrapy.Spider):
    name = 'blabla'
    allowed_domains = ['blabla']

    def start_requests(self):
        yield scrapy.Request('http://blabla.org/forum/viewforum.php?f=123', self.parse)

    def parse(self, response):
        item = Item()
        item['Title'] = response.xpath('//a[@class="title"/text()').extract()
        yield item

        next_page = response.xpath('//a[text()="Next"]/@href')
        if next_page:
            url = response.urljoin(next_page[0].extract())
            yield scrapy.Request(url, callback=self.parse)

问题：即使下一个 page_page 和 url 存在并且正确，蜘蛛在第一页后停止。

这是停止前的最后一条调试消息：

[scrapy] DEBUG: Crawled (200) <GET http://blabla.org/forum/viewforum.php?f=123&start=50> (referer: http://blabla.org/forum/viewforum.php?f=123)
[scrapy] INFO: Closing spider (finished)

Answer 1

你需要按照这个检查。

检查您尝试抓取的网址是否不是 Robots.txt，您可以通过查看 http://blabla.org/robots.txt 找到它。默认情况下，scrapy 服从 robots.txt。 建议您遵守robots.txt
默认的scrapy下载延迟是0.25，你可以增加2秒以上试试

Answer 2

问题是下一页的响应是机器人的响应，不包含任何链接。

蜘蛛不去下一页

The spider doesn't go to the next page

python

scrapy

python-3.x

scrapy-spider