urlopen 仅适用于 Python3 中的某些 URL
urlopen only working for certain URLs in Python3
所以我正在尝试获取 python3 中某个页面的 URL...
如果我执行以下操作,
from urllib.request import urlopen
html = urlopen("http://google.com/")
html.read()
我得到了想要的 html。
但是,如果我要选择不同的 url,如下所示,
from urllib.request import urlopen
html = urlopen("http://www.whosebug.com/")
html.read()
我在 second 行后收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
知道为什么会发生这种情况以及如何解决它吗?
如果仔细查看错误消息,您会发现这是一个 HTTP 错误,而且是一个特殊错误:
HTTP Error 403: Forbidden
所以你与服务器交谈并得到了你的回应,但你不知道为什么你被拒绝了。
您可以在服务器返回的 HTML 中获得更详细的消息,如下所示:
from urllib.request import urlopen
from urllib.error import HTTPError
try:
html = urlopen("http://www.whosebug.com/")
except HTTPError as e:
print(e.read().decode('utf-8'))
html.read()
对我来说它说:
<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.whosebug.com) has banned your access based on your browser's signature (213702c58d2116a6-ua48).</p>
您可以将 HTTPError
视为文件对象 (https://docs.python.org/3/library/urllib.error.html#urllib.error.HTTPError):
Though being an exception (a subclass of URLError), an HTTPError can
also function as a non-exceptional file-like return value (the same
thing that urlopen() returns). This is useful when handling exotic
HTTP errors, such as requests for authentication.
所以我正在尝试获取 python3 中某个页面的 URL...
如果我执行以下操作,
from urllib.request import urlopen
html = urlopen("http://google.com/")
html.read()
我得到了想要的 html。 但是,如果我要选择不同的 url,如下所示,
from urllib.request import urlopen
html = urlopen("http://www.whosebug.com/")
html.read()
我在 second 行后收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
知道为什么会发生这种情况以及如何解决它吗?
如果仔细查看错误消息,您会发现这是一个 HTTP 错误,而且是一个特殊错误:
HTTP Error 403: Forbidden
所以你与服务器交谈并得到了你的回应,但你不知道为什么你被拒绝了。
您可以在服务器返回的 HTML 中获得更详细的消息,如下所示:
from urllib.request import urlopen
from urllib.error import HTTPError
try:
html = urlopen("http://www.whosebug.com/")
except HTTPError as e:
print(e.read().decode('utf-8'))
html.read()
对我来说它说:
<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.whosebug.com) has banned your access based on your browser's signature (213702c58d2116a6-ua48).</p>
您可以将 HTTPError
视为文件对象 (https://docs.python.org/3/library/urllib.error.html#urllib.error.HTTPError):
Though being an exception (a subclass of URLError), an HTTPError can also function as a non-exceptional file-like return value (the same thing that urlopen() returns). This is useful when handling exotic HTTP errors, such as requests for authentication.