Why am I getting an ValueError: too many file descriptors in select()?
Why am I getting an ValueError: too many file descriptors in select()?
我将我的代理加载到 proxies
变量中,并尝试为获取 ip 执行异步请求。很简单:
async def get_ip(proxy):
timeout = aiohttp.ClientTimeout(connect=5)
async with aiohttp.ClientSession(timeout=timeout) as session:
try:
async with session.get('https://api.ipify.org?format=json', proxy=proxy, timeout=timeout) as response:
json_response = await response.json()
print(json_response)
except:
pass
if __name__ == "__main__":
proxies = []
start_time = time.time()
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(get_ip(proxy)) for proxy in proxies]
loop.run_until_complete(asyncio.wait(tasks))
print('time spent to work: {} sec --------------'.format(time.time()-start_time))
当我尝试执行 100-200-300-400 个请求时,此代码工作正常,但当计数超过 500 个时,我总是收到错误消息:
Traceback (most recent call last):
File "async_get_ip.py", line 60, in <module>
loop.run_until_complete(asyncio.wait(tasks))
File "C:\Python37\lib\asyncio\base_events.py", line 571, in run_until_complete
self.run_forever()
File "C:\Python37\lib\asyncio\base_events.py", line 539, in run_forever
self._run_once()
File "C:\Python37\lib\asyncio\base_events.py", line 1739, in _run_once
event_list = self._selector.select(timeout)
File "C:\Python37\lib\selectors.py", line 323, in select
r, w, _ = self._select(self._readers, self._writers, [], timeout)
File "C:\Python37\lib\selectors.py", line 314, in _select
r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()
我一直在寻找解决方案,但我发现的只是 OS 处的限制。我可以在不使用其他库的情况下以某种方式解决这个问题吗?
同时启动无限数量的请求不是一个好主意。每个启动的请求都会消耗一些资源CPU/RAM到OS的select()容量,像你这样的情况,迟早会出问题。
为避免这种情况,您应该使用 asyncio.Semaphore 来限制同时连接的最大数量。
我认为您的代码只需进行少量更改:
sem = asyncio.Semaphore(50)
async def get_ip(proxy):
async with sem:
# ...
一般如何使用信号量。
P.S.
except:
pass
你永远不应该做这样的事情,它只会break a code sooner or later。
至少使用 except Exception
。
我将我的代理加载到 proxies
变量中,并尝试为获取 ip 执行异步请求。很简单:
async def get_ip(proxy):
timeout = aiohttp.ClientTimeout(connect=5)
async with aiohttp.ClientSession(timeout=timeout) as session:
try:
async with session.get('https://api.ipify.org?format=json', proxy=proxy, timeout=timeout) as response:
json_response = await response.json()
print(json_response)
except:
pass
if __name__ == "__main__":
proxies = []
start_time = time.time()
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(get_ip(proxy)) for proxy in proxies]
loop.run_until_complete(asyncio.wait(tasks))
print('time spent to work: {} sec --------------'.format(time.time()-start_time))
当我尝试执行 100-200-300-400 个请求时,此代码工作正常,但当计数超过 500 个时,我总是收到错误消息:
Traceback (most recent call last):
File "async_get_ip.py", line 60, in <module>
loop.run_until_complete(asyncio.wait(tasks))
File "C:\Python37\lib\asyncio\base_events.py", line 571, in run_until_complete
self.run_forever()
File "C:\Python37\lib\asyncio\base_events.py", line 539, in run_forever
self._run_once()
File "C:\Python37\lib\asyncio\base_events.py", line 1739, in _run_once
event_list = self._selector.select(timeout)
File "C:\Python37\lib\selectors.py", line 323, in select
r, w, _ = self._select(self._readers, self._writers, [], timeout)
File "C:\Python37\lib\selectors.py", line 314, in _select
r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()
我一直在寻找解决方案,但我发现的只是 OS 处的限制。我可以在不使用其他库的情况下以某种方式解决这个问题吗?
同时启动无限数量的请求不是一个好主意。每个启动的请求都会消耗一些资源CPU/RAM到OS的select()容量,像你这样的情况,迟早会出问题。
为避免这种情况,您应该使用 asyncio.Semaphore 来限制同时连接的最大数量。
我认为您的代码只需进行少量更改:
sem = asyncio.Semaphore(50)
async def get_ip(proxy):
async with sem:
# ...
P.S.
except:
pass
你永远不应该做这样的事情,它只会break a code sooner or later。
至少使用 except Exception
。