使用 urllib 获取网站导致 HTTP 405 错误
Getting a website with urllib results in HTTP 405 error
我正在学习 beautifulsoup 并且正在尝试编写一个小脚本来在荷兰房地产网站上寻找房屋。当我尝试获取网站内容时,我立即收到 HTTP405 错误:
File "funda.py", line 2, in <module>
html = urlopen("http://www.funda.nl")
File "<folders>request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "<folders>request.py", line 532, in open
response = meth(req, response)
File "<folders>request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "<folders>request.py", line 570, in error
return self._call_chain(*args)
File "<folders>request.py", line 504, in _call_chain
result = func(*args)
File "<folders>request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: Not Allowed
我要执行的是什么:
from urllib.request import urlopen
html = urlopen("http://www.funda.nl")
知道为什么会导致 HTTP405 吗?我只是在做一个 GET 请求,对吧?
如果您不使用 Requests 或 urllib2,它会起作用:
import urllib
html = urllib.urlopen("http://www.funda.nl")
leovp 的评论很有道理。
可能与 HTTPError: HTTP Error 403: Forbidden 重复。您需要假装您是常客。这通常(因站点而异)通过使用通用/常规 User-Agent
HTTP header.
完成
>>> url = "http://www.funda.nl"
>>> import urllib.request
>>> req = urllib.request.Request(
... url,
... data=None,
... headers={
... 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
... }
... )
>>> f = urllib.request.urlopen(req)
>>> f.status, f.msg
(200, 'OK')
使用 requests
库 -
>>> import requests
>>> response = requests.get(
... url,
... headers={
... 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
... }
... )
>>> response.status_code
200
我正在学习 beautifulsoup 并且正在尝试编写一个小脚本来在荷兰房地产网站上寻找房屋。当我尝试获取网站内容时,我立即收到 HTTP405 错误:
File "funda.py", line 2, in <module>
html = urlopen("http://www.funda.nl")
File "<folders>request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "<folders>request.py", line 532, in open
response = meth(req, response)
File "<folders>request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "<folders>request.py", line 570, in error
return self._call_chain(*args)
File "<folders>request.py", line 504, in _call_chain
result = func(*args)
File "<folders>request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: Not Allowed
我要执行的是什么:
from urllib.request import urlopen
html = urlopen("http://www.funda.nl")
知道为什么会导致 HTTP405 吗?我只是在做一个 GET 请求,对吧?
如果您不使用 Requests 或 urllib2,它会起作用:
import urllib
html = urllib.urlopen("http://www.funda.nl")
leovp 的评论很有道理。
可能与 HTTPError: HTTP Error 403: Forbidden 重复。您需要假装您是常客。这通常(因站点而异)通过使用通用/常规 User-Agent
HTTP header.
>>> url = "http://www.funda.nl"
>>> import urllib.request
>>> req = urllib.request.Request(
... url,
... data=None,
... headers={
... 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
... }
... )
>>> f = urllib.request.urlopen(req)
>>> f.status, f.msg
(200, 'OK')
使用 requests
库 -
>>> import requests
>>> response = requests.get(
... url,
... headers={
... 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
... }
... )
>>> response.status_code
200