使用带有 Google 云存储的 Scrapy 作为提要导出的问题
Issues using Scrapy with Google Cloud Storage as a feed export
我根据 scrapy docs 使用 GCS 作为 Scrapy 上的提要导出。奇怪的是它有时确实有效。
但有时它会在上传时失败,我唯一能看到的不同之处在于它试图上传更多数据。话虽如此,它仍然无法上传 ~60Mb,这让我怀疑数据规模是否真的是一个问题。有人可以解释这是我的配置问题还是 Scrapy 本身的问题?错误报告如下:
2020-12-01 23:07:26 [scrapy.extensions.feedexport] ERROR: Error storing csv feed (19826 items) in: gs://instoxi_amazon/com/Ngolo/Amazon_Beauty_&_Personal_Care_Ngolo.csv
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1065, in _send_output
self.send(chunk)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 987, in send
self.sock.sendall(data)
File "C:\ProgramData\Anaconda3\lib\ssl.py", line 1034, in sendall
v = self.send(byte_view[count:])
File "C:\ProgramData\Anaconda3\lib\ssl.py", line 1003, in send
return self._sslobj.write(data)
ssl.SSLWantWriteError: The operation did not complete (write) (_ssl.c:2361)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /upload/storage/v1/b/instoxi_amazon/o?uploadType=resumable&upload_id=ABg5-Uwjc9Vs5HdgyQdhTTm0ph3N_xQIoZaAE44Oiv2MdMO6q-YhD31eRkWO6W7UNAlehUKm4FTgVv0KXq32SHmCrDU (Caused by SSLError(SSLWantWriteError(3, 'The operation did not complete (write) (_ssl.c:2361)')))
这是我的第一个问题,如果有更好的提问/展示方式,请告诉我。澄清一下,我在 Scrapy 之外使用 Python 与 GCS 交互没有任何问题。干杯!
我以前见过 The operation did not complete (write) (_ssl.c:2361)
,这是由于网络问题。这也符合这样一个事实,即它对你来说不一致。如果可以,我建议您尝试使用其他网络连接互联网。
尽管如此,我还是建议您确保使用最新版本的 Scrapy
我根据 scrapy docs 使用 GCS 作为 Scrapy 上的提要导出。奇怪的是它有时确实有效。
但有时它会在上传时失败,我唯一能看到的不同之处在于它试图上传更多数据。话虽如此,它仍然无法上传 ~60Mb,这让我怀疑数据规模是否真的是一个问题。有人可以解释这是我的配置问题还是 Scrapy 本身的问题?错误报告如下:
2020-12-01 23:07:26 [scrapy.extensions.feedexport] ERROR: Error storing csv feed (19826 items) in: gs://instoxi_amazon/com/Ngolo/Amazon_Beauty_&_Personal_Care_Ngolo.csv
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 1065, in _send_output
self.send(chunk)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 987, in send
self.sock.sendall(data)
File "C:\ProgramData\Anaconda3\lib\ssl.py", line 1034, in sendall
v = self.send(byte_view[count:])
File "C:\ProgramData\Anaconda3\lib\ssl.py", line 1003, in send
return self._sslobj.write(data)
ssl.SSLWantWriteError: The operation did not complete (write) (_ssl.c:2361)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\util\retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /upload/storage/v1/b/instoxi_amazon/o?uploadType=resumable&upload_id=ABg5-Uwjc9Vs5HdgyQdhTTm0ph3N_xQIoZaAE44Oiv2MdMO6q-YhD31eRkWO6W7UNAlehUKm4FTgVv0KXq32SHmCrDU (Caused by SSLError(SSLWantWriteError(3, 'The operation did not complete (write) (_ssl.c:2361)')))
这是我的第一个问题,如果有更好的提问/展示方式,请告诉我。澄清一下,我在 Scrapy 之外使用 Python 与 GCS 交互没有任何问题。干杯!
我以前见过 The operation did not complete (write) (_ssl.c:2361)
,这是由于网络问题。这也符合这样一个事实,即它对你来说不一致。如果可以,我建议您尝试使用其他网络连接互联网。
尽管如此,我还是建议您确保使用最新版本的 Scrapy