在一个 scrapy-spider 中抓取多个站点
Scraping multiple sites in one scrapy-spider
我正在 6 个不同的蜘蛛中抓取 6 个站点。但是现在,我必须在一个蜘蛛中抓取这些网站。有没有办法在同一个蜘蛛中抓取多个链接??
import spider1
import spider2
import spider3
from scrapy.crawler import CrawlerProcess
if require_spider1:
spider = spider1
urls = ['https://site1.com/']
elif require_spider2:
spider = spider2
urls = ['https://site2.com/', 'https://site2-1.com/']
elif require_spider3:
spider = spider3
urls = ['https://site3.com']
process = CrawlerProcess()
process.crawl(spider, urls=urls)
process.start()
我通过
做到了这一点
def start_requests(self):
yield Request('url1',callback=self.url1)
yield Request('url2',callback=self.url2)
yield Request('url3',callback=self.url3)
yield Request('url4',callback=self.url4)
yield Request('url5',callback=self.url5)
yield Request('url6',callback=self.url6)
我正在 6 个不同的蜘蛛中抓取 6 个站点。但是现在,我必须在一个蜘蛛中抓取这些网站。有没有办法在同一个蜘蛛中抓取多个链接??
import spider1
import spider2
import spider3
from scrapy.crawler import CrawlerProcess
if require_spider1:
spider = spider1
urls = ['https://site1.com/']
elif require_spider2:
spider = spider2
urls = ['https://site2.com/', 'https://site2-1.com/']
elif require_spider3:
spider = spider3
urls = ['https://site3.com']
process = CrawlerProcess()
process.crawl(spider, urls=urls)
process.start()
我通过
做到了这一点def start_requests(self):
yield Request('url1',callback=self.url1)
yield Request('url2',callback=self.url2)
yield Request('url3',callback=self.url3)
yield Request('url4',callback=self.url4)
yield Request('url5',callback=self.url5)
yield Request('url6',callback=self.url6)