如何修复 scrapy 蜘蛛的 'PROXIES is empty' 错误

How to fix 'PROXIES is empty' error for scrapy spider

我正在尝试通过使用代理 运行 一个 scrapy 蜘蛛,但每当我 运行 代码时都会出错。

这适用于 Mac OSX、python 3.7、scrapy 1.5.1。 我试过玩弄设置和中间件,但没有效果。

class superSpider(scrapy.Spider):
    name = "myspider"

    def start_requests(self):
        print('request')
        urls = [
            'http://quotes.toscrape.com/page/1/',
            'http://quotes.toscrape.com/page/2/',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        print('parse')

我得到的错误是:

2019-02-15 08:32:27 [scrapy.utils.log] INFO: Scrapy 1.5.1 started 
(bot: superScraper)
2019-02-15 08:32:27 [scrapy.utils.log] INFO: Versions: lxml 
4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, 
Twisted 18.9.0, Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 
03:13:28) - [Clang 6.0 (clang-600.0.57)], pyOpenSSL 18.0.0 (OpenSSL 
1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Darwin-17.7.0- 
x86_64-i386-64bit
2019-02-15 08:32:27 [scrapy.crawler] INFO: Overridden settings: 
{'BOT_NAME': 'superScraper', 'CONCURRENT_REQUESTS': 25, 
'NEWSPIDER_MODULE': 'superScraper.spiders', 'RETRY_HTTP_CODES':         
 [500, 503, 504, 400, 403, 404, 408], 'RETRY_TIMES': 10,     
'SPIDER_MODULES': ['superScraper.spiders'], 'USER_AGENT': 
'Mozilla/5.0 (compatible; bingbot/2.0; 
+http://www.bing.com/bingbot.htm)'}
2019-02-15 08:32:27 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2019-02-15 08:32:27 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
File 

"/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 171, in crawl return self._crawl(crawler, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 175, in _crawl d = crawler.crawl(*args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator return _cancellableInlineCallbacks(gen) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks _inlineCallbacks(None, g, status) --- <exception caught here> --- File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 80, in crawl self.engine = self._create_engine() File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/crawler.py", line 105, in _create_engine return ExecutionEngine(self, lambda _: self.stop()) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/engine.py", line 69, in __init__ self.downloader = downloader_cls(crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__ self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/middleware.py", line 36, in from_settings mw = mwcls.from_crawler(crawler) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy_proxies/randomproxy.py", line 99, in from_crawler return cls(crawler.settings) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy_proxies/randomproxy.py", line 74, in __init__ raise KeyError('PROXIES is empty') builtins.KeyError: 'PROXIES is empty'

这些网站来自 scrapy 的文档,无需使用代理即可运行。

对于遇到类似问题的其他人,这是我的实际 scrapy_proxies.RandomProxy 代码

的问题

使用此处的代码使其工作: https://github.com/aivarsk/scrapy-proxies

进入 scrapy_proxies 文件夹并将 RandomProxy.py 代码替换为 github

上的代码

我的是在这里找到的: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy_proxies/randomproxy.py