请求 - 代理字典
Requests - proxies dictionary
我对 requests
模块有点困惑,尤其是 代理。
来自文档:
代理
Dictionary mapping protocol to the URL of the proxy (e.g. {‘http’:
‘foo.bar:3128’}) to be used on each Request.
字典中可以有更多一种类型的代理吗?我的意思是,是否可以将代理列表放在那里,requests
模块将尝试使用它们并仅使用那些有效的代理?
或者 http
只能有一个代理地址?
proxies
参数的使用受到 python 字典性质的限制(即每个键必须是唯一的)。
import requests
url = 'http://google.com'
proxies = {'https': '84.22.41.1:3128',
'http': '185.26.183.14:80',
'http': '178.33.230.114:3128'}
if __name__ == '__main__':
print url
print proxies
response = requests.get(url, proxies=proxies)
if response.status_code == 200:
print response.text
else:
print 'Response ERROR', response.status_code
产出
http://google.com
{'http': '178.33.230.114:3128', 'https': '84.22.41.1:3128'}
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for."
...more html...
如您所见,proxies
字典中 http
协议键的值对应于其分配中最后遇到的值(即 178.33.230.114:3128
)。尝试交换 http
个条目。
所以,答案是否定的,您不能使用简单的字典为同一协议指定多个代理。
我尝试过使用可迭代对象作为值,这对我来说很有意义
proxies = {'https': '84.22.41.1:3128',
'http': ('178.33.230.114:3128', '185.26.183.14:80', )}
但运气不好,它会产生错误
好吧,实际上你可以,我已经用几行代码完成了这个并且效果很好。
import requests
class Client:
def __init__(self):
self._session = requests.Session()
self.proxies = None
def set_proxy_pool(self, proxies, auth=None, https=True):
"""Randomly choose a proxy for every GET/POST request
:param proxies: list of proxies, like ["ip1:port1", "ip2:port2"]
:param auth: if proxy needs auth
:param https: default is True, pass False if you don't need https proxy
"""
from random import choice
if https:
self.proxies = [{'http': p, 'https': p} for p in proxies]
else:
self.proxies = [{'http': p} for p in proxies]
def get_with_random_proxy(url, **kwargs):
proxy = choice(self.proxies)
kwargs['proxies'] = proxy
if auth:
kwargs['auth'] = auth
return self._session.original_get(url, **kwargs)
def post_with_random_proxy(url, *args, **kwargs):
proxy = choice(self.proxies)
kwargs['proxies'] = proxy
if auth:
kwargs['auth'] = auth
return self._session.original_post(url, *args, **kwargs)
self._session.original_get = self._session.get
self._session.get = get_with_random_proxy
self._session.original_post = self._session.post
self._session.post = post_with_random_proxy
def remove_proxy_pool(self):
self.proxies = None
self._session.get = self._session.original_get
self._session.post = self._session.original_post
del self._session.original_get
del self._session.original_post
# You can define whatever operations using self._session
我是这样使用的:
client = Client()
client.set_proxy_pool(['112.25.41.136', '180.97.29.57'])
它很简单,但对我来说确实有效。
我对 requests
模块有点困惑,尤其是 代理。
来自文档:
代理
Dictionary mapping protocol to the URL of the proxy (e.g. {‘http’: ‘foo.bar:3128’}) to be used on each Request.
字典中可以有更多一种类型的代理吗?我的意思是,是否可以将代理列表放在那里,requests
模块将尝试使用它们并仅使用那些有效的代理?
或者 http
只能有一个代理地址?
proxies
参数的使用受到 python 字典性质的限制(即每个键必须是唯一的)。
import requests
url = 'http://google.com'
proxies = {'https': '84.22.41.1:3128',
'http': '185.26.183.14:80',
'http': '178.33.230.114:3128'}
if __name__ == '__main__':
print url
print proxies
response = requests.get(url, proxies=proxies)
if response.status_code == 200:
print response.text
else:
print 'Response ERROR', response.status_code
产出
http://google.com
{'http': '178.33.230.114:3128', 'https': '84.22.41.1:3128'}
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for."
...more html...
如您所见,proxies
字典中 http
协议键的值对应于其分配中最后遇到的值(即 178.33.230.114:3128
)。尝试交换 http
个条目。
所以,答案是否定的,您不能使用简单的字典为同一协议指定多个代理。
我尝试过使用可迭代对象作为值,这对我来说很有意义
proxies = {'https': '84.22.41.1:3128',
'http': ('178.33.230.114:3128', '185.26.183.14:80', )}
但运气不好,它会产生错误
好吧,实际上你可以,我已经用几行代码完成了这个并且效果很好。
import requests
class Client:
def __init__(self):
self._session = requests.Session()
self.proxies = None
def set_proxy_pool(self, proxies, auth=None, https=True):
"""Randomly choose a proxy for every GET/POST request
:param proxies: list of proxies, like ["ip1:port1", "ip2:port2"]
:param auth: if proxy needs auth
:param https: default is True, pass False if you don't need https proxy
"""
from random import choice
if https:
self.proxies = [{'http': p, 'https': p} for p in proxies]
else:
self.proxies = [{'http': p} for p in proxies]
def get_with_random_proxy(url, **kwargs):
proxy = choice(self.proxies)
kwargs['proxies'] = proxy
if auth:
kwargs['auth'] = auth
return self._session.original_get(url, **kwargs)
def post_with_random_proxy(url, *args, **kwargs):
proxy = choice(self.proxies)
kwargs['proxies'] = proxy
if auth:
kwargs['auth'] = auth
return self._session.original_post(url, *args, **kwargs)
self._session.original_get = self._session.get
self._session.get = get_with_random_proxy
self._session.original_post = self._session.post
self._session.post = post_with_random_proxy
def remove_proxy_pool(self):
self.proxies = None
self._session.get = self._session.original_get
self._session.post = self._session.original_post
del self._session.original_get
del self._session.original_post
# You can define whatever operations using self._session
我是这样使用的:
client = Client()
client.set_proxy_pool(['112.25.41.136', '180.97.29.57'])
它很简单,但对我来说确实有效。