连接被对方拒绝：111：连接被拒绝

Question

我有一个 LinkedIn 蜘蛛。它在我的本地机器上运行良好，但是当我在 Scrapinghub 上部署时出现错误：

Error downloading <GET https://www.linkedin.com/>: Connection was refused by other side: 111: Connection refused.

Scrapinghub的完整日志为：

0:  2018-08-30 12:58:34 INFO    Log opened.
1:  2018-08-30 12:58:34 INFO    [scrapy.log] Scrapy 1.0.5 started
2:  2018-08-30 12:58:34 INFO    [scrapy.utils.log] Scrapy 1.0.5 started (bot: facebook_stats)
3:  2018-08-30 12:58:34 INFO    [scrapy.utils.log] Optional features available: ssl, http11, boto
4:  2018-08-30 12:58:34 INFO    [scrapy.utils.log] Overridden settings: {'NEWSPIDER_MODULE': 'facebook_stats.spiders', 'STATS_CLASS': 'sh_scrapy.stats.HubStorageStatsCollector', 'LOG_LEVEL': 'INFO', 'SPIDER_MODULES': ['facebook_stats.spiders'], 'RETRY_TIMES': 10, 'RETRY_HTTP_CODES': [500, 503, 504, 400, 403, 404, 408], 'BOT_NAME': 'facebook_stats', 'MEMUSAGE_LIMIT_MB': 950, 'DOWNLOAD_DELAY': 1, 'TELNETCONSOLE_HOST': '0.0.0.0', 'LOG_FILE': 'scrapy.log', 'MEMUSAGE_ENABLED': True, 'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'}
5:  2018-08-30 12:58:34 INFO    [scrapy.log] HubStorage: writing items to https://storage.scrapinghub.com/items/341545/3/9
6:  2018-08-30 12:58:34 INFO    [scrapy.middleware] Enabled extensions: CoreStats, TelnetConsole, MemoryUsage, LogStats, StackTraceDump, CloseSpider, SpiderState, AutoThrottle, HubstorageExtension
7:  2018-08-30 12:58:35 INFO    [scrapy.middleware] Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
8:  2018-08-30 12:58:35 INFO    [scrapy.middleware] Enabled spider middlewares: HubstorageMiddleware, HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
9:  2018-08-30 12:58:35 INFO    [scrapy.middleware] Enabled item pipelines: CreditCardsPipeline
10: 2018-08-30 12:58:35 INFO    [scrapy.core.engine] Spider opened
11: 2018-08-30 12:58:36 INFO    [scrapy.extensions.logstats] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
12: 2018-08-30 12:58:36 INFO    TelnetConsole starting on 6023
13: 2018-08-30 12:59:32 ERROR   [scrapy.core.scraper] Error downloading <GET https://www.linkedin.com/>: Connection was refused by other side: 111: Connection refused.
14: 2018-08-30 12:59:32 INFO    [scrapy.core.engine] Closing spider (finished)
15: 2018-08-30 12:59:33 INFO    [scrapy.statscollectors] Dumping Scrapy stats: More
16: 2018-08-30 12:59:34 INFO    [scrapy.core.engine] Spider closed (finished)
17: 2018-08-30 12:59:34 INFO    Main loop terminated.

我该如何解决这个问题？

Answer 1

LinkedIn prohibits scraping:

Prohibited Software and Extensions

LinkedIn is committed to keeping its members' data safe and its website free from fraud and abuse. In order to protect our members’ data and our website, we don't permit the use of any third party software, including "crawlers", bots, browser plug-ins, or browser extensions (also called "add-ons"), that scrapes, modifies the appearance of, or automates activity on LinkedIn’s website. Such tools violate the User Agreement, including, but not limited to, many of the "Don'ts" listed in Section 8.2…

有理由认为他们可能会主动阻止来自 Scrapinghub 和类似服务的连接。

连接被对方​​拒绝：111：连接被拒绝

Connection was refused by other side: 111: Connection refused

python

scrapy

scrapinghub

Prohibited Software and Extensions

连接被对方拒绝：111：连接被拒绝