重定向处理程序 python 3.4.3
redirect handler python 3.4.3
我正在使用 urllib.request 软件包打开和阅读网页。我想确保我的代码能够很好地处理重定向。现在我只是在看到重定向时失败(这是一个 HTTPError)。有人可以指导我如何处理吗?我的代码目前看起来像:
try:
text = str(urllib.request.urlopen(url, timeout=10).read())
except ValueError as error:
print(error)
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
请帮助我,我是新手。谢谢!
我使用特殊的 URLopener 来捕获重定向:
import urllib
class RedirectException(Exception):
def __init__(self, errcode, newurl):
Exception.__init__(self)
self.errcode = errcode
self.newurl = newurl
class MyURLopener(urllib.URLopener):
# Error 301 -- relocated (permanently)
def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
if headers.has_key('location'):
newurl = headers['location']
elif headers.has_key('uri'):
newurl = headers['uri']
else:
newurl = "Nowhere"
raise RedirectException(errcode, newurl)
# Error 302 -- relocated (temporarily)
http_error_302 = http_error_301
# Error 303 -- relocated (see other)
http_error_303 = http_error_301
# Error 307 -- relocated (temporarily)
http_error_307 = http_error_301
urllib._urlopener = MyURLopener()
现在我需要捕获 RedirectException,瞧瞧——我知道有一个重定向,我知道 URL。警告——我使用 Python 2.7 的代码,不知道它是否适用于 Python 3.
使用 requests
包我找到了更好的解决方案。您需要处理的唯一例外是:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
要获取该页面的文本,您需要做的就是:
r.text
我正在使用 urllib.request 软件包打开和阅读网页。我想确保我的代码能够很好地处理重定向。现在我只是在看到重定向时失败(这是一个 HTTPError)。有人可以指导我如何处理吗?我的代码目前看起来像:
try:
text = str(urllib.request.urlopen(url, timeout=10).read())
except ValueError as error:
print(error)
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
请帮助我,我是新手。谢谢!
我使用特殊的 URLopener 来捕获重定向:
import urllib
class RedirectException(Exception):
def __init__(self, errcode, newurl):
Exception.__init__(self)
self.errcode = errcode
self.newurl = newurl
class MyURLopener(urllib.URLopener):
# Error 301 -- relocated (permanently)
def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
if headers.has_key('location'):
newurl = headers['location']
elif headers.has_key('uri'):
newurl = headers['uri']
else:
newurl = "Nowhere"
raise RedirectException(errcode, newurl)
# Error 302 -- relocated (temporarily)
http_error_302 = http_error_301
# Error 303 -- relocated (see other)
http_error_303 = http_error_301
# Error 307 -- relocated (temporarily)
http_error_307 = http_error_301
urllib._urlopener = MyURLopener()
现在我需要捕获 RedirectException,瞧瞧——我知道有一个重定向,我知道 URL。警告——我使用 Python 2.7 的代码,不知道它是否适用于 Python 3.
使用 requests
包我找到了更好的解决方案。您需要处理的唯一例外是:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
要获取该页面的文本,您需要做的就是:
r.text