使用 urlopen 时出现 "RemoteDisconnected" 错误
Getting "RemoteDisconnected" Error using urlopen
所以我只想阅读使用
的网站的Html
from urllib.request import urlopen
url = 'https://dictionary.cambridge.org/dictionary/english/water'
page = urlopen(url)
对于某些网站,它可以工作,但对于上面代码中的某些网站,我收到错误
Traceback (most recent call last):
File "F:/mohammad Desktop/work spaces/python/Python Turial Release 3.9.1/mod2.py", line 4, in <module>
page = urlopen(url)
File "C:\Python\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python\Python38\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Python\Python38\lib\urllib\request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Python\Python38\lib\urllib\request.py", line 1362, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "C:\Python\Python38\lib\urllib\request.py", line 1323, in do_open
r = h.getresponse()
File "C:\Python\Python38\lib\http\client.py", line 1322, in getresponse
response.begin()
File "C:\Python\Python38\lib\http\client.py", line 303, in begin
version, status, reason = self._read_status()
File "C:\Python\Python38\lib\http\client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
有一些类似的问题,但对我来说没有解决方案。
我能够重现此行为。
可以通过使用 request
object 并将请求 headers 更改为更常用于 Web 浏览器的请求来修复此问题。例如 mac:
上的 firefox
import urllib
import requests
url = 'https://dictionary.cambridge.org/dictionary/english/water'
req = urllib.request.Request(url, headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_8) AppleWebKit/534.50.2 (KHTML, like Gecko) Version/5.0.6 Safari/533.22.3'})
print(urllib.request.urlopen(req).read())
我建议发生这种情况是因为 https://dictionary.cambridge.org 的 Web 服务器已设置为阻止与 headers 关联的请求 HTML 抓取(如默认一个 urllib.request.urlopen
).
但是,我不确定故意使用不正确的道德规范 headers;他们可能因某种原因被阻止...
所以我只想阅读使用
的网站的Htmlfrom urllib.request import urlopen
url = 'https://dictionary.cambridge.org/dictionary/english/water'
page = urlopen(url)
对于某些网站,它可以工作,但对于上面代码中的某些网站,我收到错误
Traceback (most recent call last):
File "F:/mohammad Desktop/work spaces/python/Python Turial Release 3.9.1/mod2.py", line 4, in <module>
page = urlopen(url)
File "C:\Python\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python\Python38\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Python\Python38\lib\urllib\request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Python\Python38\lib\urllib\request.py", line 1362, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "C:\Python\Python38\lib\urllib\request.py", line 1323, in do_open
r = h.getresponse()
File "C:\Python\Python38\lib\http\client.py", line 1322, in getresponse
response.begin()
File "C:\Python\Python38\lib\http\client.py", line 303, in begin
version, status, reason = self._read_status()
File "C:\Python\Python38\lib\http\client.py", line 272, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
有一些类似的问题,但对我来说没有解决方案。
我能够重现此行为。
可以通过使用 request
object 并将请求 headers 更改为更常用于 Web 浏览器的请求来修复此问题。例如 mac:
import urllib
import requests
url = 'https://dictionary.cambridge.org/dictionary/english/water'
req = urllib.request.Request(url, headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_8) AppleWebKit/534.50.2 (KHTML, like Gecko) Version/5.0.6 Safari/533.22.3'})
print(urllib.request.urlopen(req).read())
我建议发生这种情况是因为 https://dictionary.cambridge.org 的 Web 服务器已设置为阻止与 headers 关联的请求 HTML 抓取(如默认一个 urllib.request.urlopen
).
但是,我不确定故意使用不正确的道德规范 headers;他们可能因某种原因被阻止...