scrapy yield 请求不工作

scrapy yield Request not working

我写了下面的 scrapy 蜘蛛,但在初始请求后它没有继续爬行过程,尽管我已经 yield编辑了更多 scrapy.Requests 以供 scrapy 遵循。

import regex as re
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import Spider

class myspider(Spider):
name = 'haha'

allowed_domains = ['https://blog.scrapinghub.com/']
start_urls = ['https://blog.scrapinghub.com/']
extractor = LinkExtractor(allow=allowed_domains)

def parse(self, response):
    # To extract all the links on this page
    links_in_page = self.extractor.extract_links(response)

    for link in links_in_page:
        yield scrapy.Request(link.url, callback=self.parse)

allowed_domains 需要 a list of domains,而不是 URL 列表。

所以应该是:

allowed_domains = ['blog.scrapinghub.com']