scrapy yield 请求不工作
scrapy yield Request not working
我写了下面的 scrapy 蜘蛛,但在初始请求后它没有继续爬行过程,尽管我已经 yield
编辑了更多 scrapy.Request
s 以供 scrapy 遵循。
import regex as re
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import Spider
class myspider(Spider):
name = 'haha'
allowed_domains = ['https://blog.scrapinghub.com/']
start_urls = ['https://blog.scrapinghub.com/']
extractor = LinkExtractor(allow=allowed_domains)
def parse(self, response):
# To extract all the links on this page
links_in_page = self.extractor.extract_links(response)
for link in links_in_page:
yield scrapy.Request(link.url, callback=self.parse)
allowed_domains
需要 a list of domains,而不是 URL 列表。
所以应该是:
allowed_domains = ['blog.scrapinghub.com']
我写了下面的 scrapy 蜘蛛,但在初始请求后它没有继续爬行过程,尽管我已经 yield
编辑了更多 scrapy.Request
s 以供 scrapy 遵循。
import regex as re
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import Spider
class myspider(Spider):
name = 'haha'
allowed_domains = ['https://blog.scrapinghub.com/']
start_urls = ['https://blog.scrapinghub.com/']
extractor = LinkExtractor(allow=allowed_domains)
def parse(self, response):
# To extract all the links on this page
links_in_page = self.extractor.extract_links(response)
for link in links_in_page:
yield scrapy.Request(link.url, callback=self.parse)
allowed_domains
需要 a list of domains,而不是 URL 列表。
所以应该是:
allowed_domains = ['blog.scrapinghub.com']