Return 引发 scrapy.exceptions.UsageError 异常时的非零退出代码
Return non-zero exit code when raising a scrapy.exceptions.UsageError exception
我有一个如下所示的 Scrapy 脚本:
main.py
import os
import argparse
import datetime
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from spiders.mySpider import MySpider
parser = argparse.ArgumentParser(description='My Scrapper')
parser.add_argument('-v',
'--verbose',
help='Verbose mode',
action='store_true')
parser.add_argument('-t',
'--type',
help='Type',
type=str)
args = parser.parse_args()
if args.type != 'expected':
parser.error("Wrong type")
if __name__ == "__main__":
settings = get_project_settings()
settings['LOG_ENABLED'] = args.verbose
process = CrawlerProcess(settings=settings)
process.crawl(MySpider, type_arg=args.type)
process.start()
mySpider.py
from scrapy import Spider
from scrapy.http import Request, FormRequest
import scrapy.exceptions as ScrapyExceptions
class MySpider(Spider):
name = 'MyScrapper'
allowed_domains = ['www.webtoscrape.com']
start_urls = ['http://www.webtoscrape.com/path/to/page.html']
def parse(self, response):
# ...
# Some logic
# ...
if condition:
raise ScrapyExceptions.UsageError(reason="Wrong argument")
当我在 main.py
文件上引发 parser.error()
时,我的进程 return 是预期的非零退出代码。但是,当我在 mySpider.py
文件上引发 scrapy.exceptions.UsageError()
时,我收到一个 0 退出代码,因此 Jenkins 管道步骤 I 运行 我的脚本认为它已经成功并继续管道执行。我 运行 我的脚本带有 python3 main.py --type my_type
命令。
为什么脚本执行没有注意到 mySpider.py
模块上引发的使用错误应该 return 非零退出代码?
经过几个小时的尝试,我发现 this thread. The problem is that Scrapy does not use a non-zero exit code when a scrape fails. I managed to fix this behaviour by using the Crawler stats collection。
main.py
if __name__ == "__main__":
settings = get_project_settings()
settings['LOG_ENABLED'] = args.verbose
process = CrawlerProcess(settings=settings)
process.crawl(MySpider, type_arg=args.type)
crawler = list(process.crawlers)[0]
process.start()
failed = crawler.stats.get_value('custom/failed_job')
if failed:
sys.exit(1)
mySpider.py
class MySpider(Spider):
name = 'MyScrapper'
allowed_domains = ['www.webtoscrape.com']
start_urls = ['http://www.webtoscrape.com/path/to/page.html']
def parse(self, response):
# ...
# Some logic
# ...
if condition:
self.crawler.stats.set_value('custom/failed_job', 'True')
raise ScrapyExceptions.UsageError(reason="Wrong argument")
我有一个如下所示的 Scrapy 脚本:
main.py
import os
import argparse
import datetime
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from spiders.mySpider import MySpider
parser = argparse.ArgumentParser(description='My Scrapper')
parser.add_argument('-v',
'--verbose',
help='Verbose mode',
action='store_true')
parser.add_argument('-t',
'--type',
help='Type',
type=str)
args = parser.parse_args()
if args.type != 'expected':
parser.error("Wrong type")
if __name__ == "__main__":
settings = get_project_settings()
settings['LOG_ENABLED'] = args.verbose
process = CrawlerProcess(settings=settings)
process.crawl(MySpider, type_arg=args.type)
process.start()
mySpider.py
from scrapy import Spider
from scrapy.http import Request, FormRequest
import scrapy.exceptions as ScrapyExceptions
class MySpider(Spider):
name = 'MyScrapper'
allowed_domains = ['www.webtoscrape.com']
start_urls = ['http://www.webtoscrape.com/path/to/page.html']
def parse(self, response):
# ...
# Some logic
# ...
if condition:
raise ScrapyExceptions.UsageError(reason="Wrong argument")
当我在 main.py
文件上引发 parser.error()
时,我的进程 return 是预期的非零退出代码。但是,当我在 mySpider.py
文件上引发 scrapy.exceptions.UsageError()
时,我收到一个 0 退出代码,因此 Jenkins 管道步骤 I 运行 我的脚本认为它已经成功并继续管道执行。我 运行 我的脚本带有 python3 main.py --type my_type
命令。
为什么脚本执行没有注意到 mySpider.py
模块上引发的使用错误应该 return 非零退出代码?
经过几个小时的尝试,我发现 this thread. The problem is that Scrapy does not use a non-zero exit code when a scrape fails. I managed to fix this behaviour by using the Crawler stats collection。
main.py
if __name__ == "__main__":
settings = get_project_settings()
settings['LOG_ENABLED'] = args.verbose
process = CrawlerProcess(settings=settings)
process.crawl(MySpider, type_arg=args.type)
crawler = list(process.crawlers)[0]
process.start()
failed = crawler.stats.get_value('custom/failed_job')
if failed:
sys.exit(1)
mySpider.py
class MySpider(Spider):
name = 'MyScrapper'
allowed_domains = ['www.webtoscrape.com']
start_urls = ['http://www.webtoscrape.com/path/to/page.html']
def parse(self, response):
# ...
# Some logic
# ...
if condition:
self.crawler.stats.set_value('custom/failed_job', 'True')
raise ScrapyExceptions.UsageError(reason="Wrong argument")