如何在大型管理器中并发 运行 异步 launchers/concurrently 运行 asyncio 函数
How to concurrently run asynchronous launchers/concurrently run asyncio functions in a big manager
我正在努力使我的代码 运行 更快地查找 roblox 帐户名。
此代码是在我的另一个问题中提供的(我在这里进行了修改)。它工作得很好,但处理大量帐户仍然需要几分钟的时间。通常我不会在意,但我正试图达到 100,000 个帐户,所以我需要性能。这就是它能跑多快?或者我们可以进一步推动它吗?答案只是更多 CPU/memory 吗?更好的互联网?我是否需要网络编程,或者是否有更快、无需请求的方式?
代码:
import asyncio
import aiohttp
async def find_account(url, session, id):
try:
async with session.get(url) as response:
if response.status == 200:
r = await response.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(r, 'html.parser')
h2 = []
for i in soup.find_all('h2'):
h2.append(i)
print('Done')
return str(list(list(h2)[0])[0]) + ' ' + str(url)
else:
return 'This account does not exist ID: {}'.format(id)
except aiohttp.ServerDisconnectedError:
print('Done')
return find_account(url, session, id)
async def main(min_id, max_id):
tasks = []
async with aiohttp.ClientSession() as session:
for id in range(min_id, max_id):
url = f'https://web.roblox.com/users/{str(id)}/profile'
tasks.append(asyncio.create_task(find_account(url=url, session=session, id=id)))
return await asyncio.gather(*tasks)
from time import time
loop = asyncio.get_event_loop()
starting = int(input("Type Your Starting Id Number>> "))
ending = int(input("Type Your Ending Id Number>> "))
timer = time()
users = loop.run_until_complete(main(starting, ending))
users = [i for i in users if i != '1']
print(users)
print(time()-timer)
您可以 运行 BeautifulSoup
在多个进程中加快速度。例如,您可以提取 find_account
中进行解析的部分并将其传递给进程池执行程序:
import concurrent.futures
_pool = concurrent.futures.ProcessPoolExecutor()
def parse(html):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
h2 = []
for i in soup.find_all('h2'):
h2.append(i)
return str(list(list(h2)[0])[0])
async def find_account(url, session, id):
while True:
async with session.get(url) as response:
if response.status == 200:
r = await response.read()
loop = asyncio.get_event_loop()
extracted = await loop.run_in_executor(_pool, parse, r)
print('Done')
return extracted + ' ' + str(url)
else:
return 'This account does not exist ID: {}'.format(id)
except aiohttp.ServerDisconnectedError:
print('Done')
# keep looping
顺便提一下,您对 find_account()
的递归调用是不正确的,因为它缺少 await
。上面的代码修复了这个问题并改为切换到循环,这使得代码实际上正在循环更加明确。
我正在努力使我的代码 运行 更快地查找 roblox 帐户名。
此代码是在我的另一个问题中提供的(我在这里进行了修改)。它工作得很好,但处理大量帐户仍然需要几分钟的时间。通常我不会在意,但我正试图达到 100,000 个帐户,所以我需要性能。这就是它能跑多快?或者我们可以进一步推动它吗?答案只是更多 CPU/memory 吗?更好的互联网?我是否需要网络编程,或者是否有更快、无需请求的方式?
代码:
import asyncio
import aiohttp
async def find_account(url, session, id):
try:
async with session.get(url) as response:
if response.status == 200:
r = await response.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(r, 'html.parser')
h2 = []
for i in soup.find_all('h2'):
h2.append(i)
print('Done')
return str(list(list(h2)[0])[0]) + ' ' + str(url)
else:
return 'This account does not exist ID: {}'.format(id)
except aiohttp.ServerDisconnectedError:
print('Done')
return find_account(url, session, id)
async def main(min_id, max_id):
tasks = []
async with aiohttp.ClientSession() as session:
for id in range(min_id, max_id):
url = f'https://web.roblox.com/users/{str(id)}/profile'
tasks.append(asyncio.create_task(find_account(url=url, session=session, id=id)))
return await asyncio.gather(*tasks)
from time import time
loop = asyncio.get_event_loop()
starting = int(input("Type Your Starting Id Number>> "))
ending = int(input("Type Your Ending Id Number>> "))
timer = time()
users = loop.run_until_complete(main(starting, ending))
users = [i for i in users if i != '1']
print(users)
print(time()-timer)
您可以 运行 BeautifulSoup
在多个进程中加快速度。例如,您可以提取 find_account
中进行解析的部分并将其传递给进程池执行程序:
import concurrent.futures
_pool = concurrent.futures.ProcessPoolExecutor()
def parse(html):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
h2 = []
for i in soup.find_all('h2'):
h2.append(i)
return str(list(list(h2)[0])[0])
async def find_account(url, session, id):
while True:
async with session.get(url) as response:
if response.status == 200:
r = await response.read()
loop = asyncio.get_event_loop()
extracted = await loop.run_in_executor(_pool, parse, r)
print('Done')
return extracted + ' ' + str(url)
else:
return 'This account does not exist ID: {}'.format(id)
except aiohttp.ServerDisconnectedError:
print('Done')
# keep looping
顺便提一下,您对 find_account()
的递归调用是不正确的,因为它缺少 await
。上面的代码修复了这个问题并改为切换到循环,这使得代码实际上正在循环更加明确。