Scrapy，报错后继续爬取

Question

我有一个 scrapy 蜘蛛，它可以为每个项目抓取两个数量。问题是我必须要用float的方式，所以碰巧爬到的其中一个字段是空的时候报错，蜘蛛就停止爬那个页面的元素了，直接去下一个页。

有没有可能让scrapy在出错后继续爬取？这是我的蜘蛛的代码。谢谢！

def parse(self, response):
    for sel in response.xpath('//li[@class="oneclass"]'):
        item = exampleItem()
        item['quant1'] = float(sel.xpath('a/div/span[@class="exampleclass"]/span[@class="amount"]/text()'))
        item['quant2'] = float(sel.xpath('div[@class="otherexampleclass"]/input/@max'))
        yield item

Answer 1

您可以将其包装在 try/except 块中：

def parse(self, response):
    for sel in response.xpath('//li[@class="oneclass"]'):
        try:
            item = exampleItem()
            item['quant1'] = float(sel.xpath('a/div/span[@class="exampleclass"]/span[@class="amount"]/text()'))
            item['quant2'] = float(sel.xpath('div[@class="otherexampleclass"]/input/@max'))
            yield item
        except:
            print "could not crawl {}".format(sel)

Scrapy，报错后继续爬取

Scrapy, keep crawling after error

python

scrapy

scrapy-spider