urllib2：即使出现异常也获取 URL 内容

Question

我正在向某些 URL 发送 POST 请求，然后此 URL 根据提供的参数抛出 200 OK 或 401 Unauthorized 代码在 POST 请求中。

除了那个 return 代码之外，该网站还 return 一个文本，它在出错时特别有用，这样发出请求的人就知道失败的原因。为此，我使用此代码：

#/usr/bin/env python

import urllib
import urllib2

url = 'https://site/request'
params = {
  'param1': 'value1',
  'param2': 'value2',
  ...
}

data = urllib.urlencode(params)
req = urllib2.Request(url, data)

try:
  response = urllib2.urlopen(req)
  the_page = response.read()
except urllib2.URLError as e:
  print e.code, e.reason  # Returns only 401 Unauthorized, not the text

当请求成功时，我得到一个200代码，我可以用the_page变量抓取消息。在那种情况下就没用了。

但是当它失败时，抛出 URLError 的行是调用 urlopen() 的行，所以我无法获取网络错误消息。

即使在 URLError 事件中，有什么方法可以抓取消息吗？如果没有，是否有其他方法可以执行 POST 请求并在出错时抓取 Web 内容？

我的 Python 版本是 2.7.6。

谢谢

Answer 1

我建议使用请求库（使用 pip install requests 安装）

import requests
url = 'https://site/request'
params = {
  'param1': 'value1',
  'param2': 'value2',
}
post_response = requests.post(url, json=params)


if post_response.ok:
    the_page = post_response.content
    # do your stuff here

print post_response.content  # this will give you the message regardless of failure
print post_response.status_code  # this will give you the status code of the request
post_response.raise_for_status()  # this will throw an error if the status is not 200

文档：http://docs.python-requests.org/en/latest/

Answer 2

如果您遇到 HTTPError——它是 URLError 的一个更具体的子类，我认为它会在 401 的情况下被引发——它可以被读取为类文件对象，产生页面内容：

urllib2.HTTPError 文档

urllib2：即使出现异常也获取 URL 内容

urllib2: Fetch URL content even on Exception

python

urllib

urllib2

python-2.7