重定向处理程序 python 3.4.3

redirect handler python 3.4.3

我正在使用 urllib.request 软件包打开和阅读网页。我想确保我的代码能够很好地处理重定向。现在我只是在看到重定向时失败(这是一个 HTTPError)。有人可以指导我如何处理吗?我的代码目前看起来像:

try:
        text = str(urllib.request.urlopen(url, timeout=10).read())
except ValueError as error:
        print(error)
except urllib.error.HTTPError as error:
        print(error)
except urllib.error.URLError as error:
        print(error)
except timeout as error:
        print(error)

请帮助我,我是新手。谢谢!

我使用特殊的 URLopener 来捕获重定向:

import urllib

class RedirectException(Exception):
    def __init__(self, errcode, newurl):
        Exception.__init__(self)
        self.errcode = errcode
        self.newurl = newurl

class MyURLopener(urllib.URLopener):
    # Error 301 -- relocated (permanently)
    def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
        if headers.has_key('location'):
            newurl = headers['location']
        elif headers.has_key('uri'):
            newurl = headers['uri']
        else:
            newurl = "Nowhere"
        raise RedirectException(errcode, newurl)

    # Error 302 -- relocated (temporarily)
    http_error_302 = http_error_301
    # Error 303 -- relocated (see other)
    http_error_303 = http_error_301
    # Error 307 -- relocated (temporarily)
    http_error_307 = http_error_301

urllib._urlopener = MyURLopener()

现在我需要捕获 RedirectException,瞧瞧——我知道有一个重定向,我知道 URL。警告——我使用 Python 2.7 的代码,不知道它是否适用于 Python 3.

使用 requests 包我找到了更好的解决方案。您需要处理的唯一例外是:

 try:
        r = requests.get(url, timeout =5)

except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop

except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one

except requests.exceptions.ConnectionError:
# Connection could not be completed

except requests.exceptions.RequestException as e:
# catastrophic error. bail.

要获取该页面的文本,您需要做的就是: r.text