Return 引发 scrapy.exceptions.UsageError 异常时的非零退出代码

Return non-zero exit code when raising a scrapy.exceptions.UsageError exception

我有一个如下所示的 Scrapy 脚本:

main.py

import os
import argparse
import datetime
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from spiders.mySpider import MySpider

parser = argparse.ArgumentParser(description='My Scrapper')
parser.add_argument('-v',
                    '--verbose', 
                    help='Verbose mode',
                    action='store_true')
parser.add_argument('-t', 
                    '--type', 
                    help='Type',
                    type=str)

args = parser.parse_args()

if args.type != 'expected':
    parser.error("Wrong type")

if __name__ == "__main__":
    settings = get_project_settings()
    settings['LOG_ENABLED'] = args.verbose
    process = CrawlerProcess(settings=settings)
    process.crawl(MySpider, type_arg=args.type)
    process.start()

mySpider.py

from scrapy import Spider
from scrapy.http import Request, FormRequest
import scrapy.exceptions as ScrapyExceptions

class MySpider(Spider):
    name = 'MyScrapper'
    allowed_domains = ['www.webtoscrape.com']
    start_urls = ['http://www.webtoscrape.com/path/to/page.html']

    def parse(self, response):
        # ...
        # Some logic
        # ...

        if condition:
            raise ScrapyExceptions.UsageError(reason="Wrong argument")

当我在 main.py 文件上引发 parser.error() 时,我的进程 return 是预期的非零退出代码。但是,当我在 mySpider.py 文件上引发 scrapy.exceptions.UsageError() 时,我收到一个 0 退出代码,因此 Jenkins 管道步骤 I 运行 我的脚本认为它已经成功并继续管道执行。我 运行 我的脚本带​​有 python3 main.py --type my_type 命令。

为什么脚本执行没有注意到 mySpider.py 模块上引发的使用错误应该 return 非零退出代码?

经过几个小时的尝试,我发现 this thread. The problem is that Scrapy does not use a non-zero exit code when a scrape fails. I managed to fix this behaviour by using the Crawler stats collection

main.py

if __name__ == "__main__":
    settings = get_project_settings()
    settings['LOG_ENABLED'] = args.verbose
    process = CrawlerProcess(settings=settings)
    process.crawl(MySpider, type_arg=args.type)
    crawler = list(process.crawlers)[0]
    process.start()

    failed = crawler.stats.get_value('custom/failed_job')
    if failed:
        sys.exit(1)

mySpider.py

class MySpider(Spider):
    name = 'MyScrapper'
    allowed_domains = ['www.webtoscrape.com']
    start_urls = ['http://www.webtoscrape.com/path/to/page.html']

    def parse(self, response):
        # ...
        # Some logic
        # ...

        if condition:
            self.crawler.stats.set_value('custom/failed_job', 'True')
            raise ScrapyExceptions.UsageError(reason="Wrong argument")