urlopen 仅适用于 Python3 中的某些 URL

Question

所以我正在尝试获取 python3 中某个页面的 URL...

如果我执行以下操作，

from urllib.request import urlopen
html = urlopen("http://google.com/")
html.read()

我得到了想要的 html。但是，如果我要选择不同的 url，如下所示，

from urllib.request import urlopen
html = urlopen("http://www.whosebug.com/")
html.read()

我在 second 行后收到以下错误：

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen return opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 461, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 574, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 499, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 433, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 582, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

知道为什么会发生这种情况以及如何解决它吗？

Answer 1

如果仔细查看错误消息，您会发现这是一个 HTTP 错误，而且是一个特殊错误：

HTTP Error 403: Forbidden

所以你与服务器交谈并得到了你的回应，但你不知道为什么你被拒绝了。

您可以在服务器返回的 HTML 中获得更详细的消息，如下所示：

from urllib.request import urlopen
from urllib.error import HTTPError

try:
    html = urlopen("http://www.whosebug.com/")
except HTTPError as e:
    print(e.read().decode('utf-8'))

html.read()

对我来说它说：

<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.whosebug.com) has banned your access based on your browser's signature (213702c58d2116a6-ua48).</p>

您可以将 HTTPError 视为文件对象 (https://docs.python.org/3/library/urllib.error.html#urllib.error.HTTPError):

Though being an exception (a subclass of URLError), an HTTPError can also function as a non-exceptional file-like return value (the same thing that urlopen() returns). This is useful when handling exotic HTTP errors, such as requests for authentication.

urlopen 仅适用于 Python3 中的某些 URL

urlopen only working for certain URLs in Python3

python

urllib

urlopen

python-3.x