为什么 urllib.request.urlopen 在《华尔街日报》的网站上给我 404?
Why is urllib.request.urlopen giving me 404 on Wall Street Journal's website?
问题
我在华尔街日报上使用 urllib.request.urlopen
,它给了我一个 404。
详情
其他站点工作正常。如果我使用 https://
,同样的错误。我在 REPL 中做了这个例子,但是我从 Django 服务器调用时发生了同样的错误:
>>> from urllib.request import urlopen
>>> urlopen('http://www.wsj.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
它应该是这样工作的:
>>> urlopen('http://www.cbc.ca')
<http.client.HTTPResponse object at 0x10b0f8c88>
我不确定如何调试它。任何人都知道发生了什么,我该如何解决?
第一次这样导入请求:
from urllib.request import **Request**, urlopen
然后将您的 url 和 header 传递给请求,如下所示:
url = 'https://www.wsj.com/'
response_obj = urlopen(Request(url, headers={'User-Agent': 'Mozilla/5.0'}))
print(response_obj)
我测试了它现在可以工作了
问题
我在华尔街日报上使用 urllib.request.urlopen
,它给了我一个 404。
详情
其他站点工作正常。如果我使用 https://
,同样的错误。我在 REPL 中做了这个例子,但是我从 Django 服务器调用时发生了同样的错误:
>>> from urllib.request import urlopen
>>> urlopen('http://www.wsj.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
它应该是这样工作的:
>>> urlopen('http://www.cbc.ca')
<http.client.HTTPResponse object at 0x10b0f8c88>
我不确定如何调试它。任何人都知道发生了什么,我该如何解决?
第一次这样导入请求:
from urllib.request import **Request**, urlopen
然后将您的 url 和 header 传递给请求,如下所示:
url = 'https://www.wsj.com/'
response_obj = urlopen(Request(url, headers={'User-Agent': 'Mozilla/5.0'}))
print(response_obj)
我测试了它现在可以工作了