Python HTTPS 网站的 urllib 错误

Python urllib error with HTTPS websites

我在 Windows 7 上使用 Python 3.4,我正在尝试使用 python 脚本测试代理是否允许或拒绝连接到特定网站。

我正在使用下面的代码:

from urllib.request import Request, urlopen

from urllib.error import URLError, HTTPError, urllib

conf = "http://{}:{}@{}".format(login, password, proxy)

supp = urllib.request.ProxyHandler({"http": conf})

auth = urllib.request.HTTPBasicAuthHandler()

open = urllib.request.build_opener(supp, auth, urllib.request.HTTPHandler)

urllib.request.install_opener(open)

response = urlopen(Request("http://www.google.com"))

执行上面的代码时没有出现错误,但是一旦我将 URL 切换为 HTTPS(例如,https://www.google.com),我就会收到以下错误:

C:\Python34\python.exe test_url.py
Traceback (most recent call last):
  File "C:\Python34\lib\urllib\request.py", line 1182, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Python34\lib\http\client.py", line 1088, in request
    self._send_request(method, url, body, headers)
  File "C:\Python34\lib\http\client.py", line 1126, in _send_request
    self.endheaders(body)
  File "C:\Python34\lib\http\client.py", line 1084, in endheaders
    self._send_output(message_body)
  File "C:\Python34\lib\http\client.py", line 922, in _send_output
    self.send(msg)
  File "C:\Python34\lib\http\client.py", line 857, in send
    self.connect()
  File "C:\Python34\lib\http\client.py", line 1223, in connect
    super().connect()
  File "C:\Python34\lib\http\client.py", line 834, in connect
    self.timeout, self.source_address)
  File "C:\Python34\lib\socket.py", line 494, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "C:\Python34\lib\socket.py", line 533, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 14, in <module>
    response = urlopen(Request("https://www.google.com"))
  File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 463, in open
    response = self._open(req, data)
  File "C:\Python34\lib\urllib\request.py", line 481, in _open
    '_open', req)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "C:\Python34\lib\urllib\request.py", line 1225, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Python34\lib\urllib\request.py", line 1184, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11004] getaddrinfo failed>

知道为什么我的代码只适用于 HTTP 网站吗?

您需要单独指定 HTTPS 的代理处理程序,因为它是与 HTTP 不同的协议。所以 ProxyHandler 行应该改为:

supp = urllib.request.ProxyHandler({"http": conf, "https": conf})