如何获取大型子域列表的响应状态?
How to get response status of large list of subdomains?
我一直在尝试一次检查所有这些子域的状态,我尝试了多种技术,甚至 grequests 和 fast than requests 也没有多大帮助,然后我开始将 asyncio 与 aiohttp 结合使用,但它比现在正常请求库。我还检查了它实际上并没有异步发送请求,而是一个接一个地发送。
我知道“await resp.status”有问题,因为 resp.status 不支持 await 但我尝试删除它,它仍然是一样的。
import aiohttp
import asyncio
import time
start_time = time.time()
async def main():
#List of 1000 subdomains , Some subdomains do not exist
data = [ "LIST OF 1000 SUBDOMAINS" ]
async with aiohttp.ClientSession() as session:
for url in data:
pokemon_url = f'{url}'
try:
async with session.get(pokemon_url, ssl=False) as resp:
pokemon = await resp.status
#If subdomain exists then print the status
print(pokemon)
except:
#else print the subdomain which does not exist or cannot be reached
print(url)
asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))
I have tried multiple techniques even grequests
grequests
对此效果很好,如果您不想,则不必使用异步。
import grequests
import time
urls = ['https://httpbin.org/delay/4' for _ in range(4)]
# each of these requests take 4 seconds to complete
# serially, these would take at least 16 (4 * 4) seconds to complete
reqs = [grequests.get(url) for url in urls]
start = time.time()
for resp in grequests.imap(reqs, size=4):
print(resp.status_code)
end = time.time()
print('finished in', round(end-start, 2), 'seconds')
200
200
200
200
finished in 4.32 seconds
我一直在尝试一次检查所有这些子域的状态,我尝试了多种技术,甚至 grequests 和 fast than requests 也没有多大帮助,然后我开始将 asyncio 与 aiohttp 结合使用,但它比现在正常请求库。我还检查了它实际上并没有异步发送请求,而是一个接一个地发送。
我知道“await resp.status”有问题,因为 resp.status 不支持 await 但我尝试删除它,它仍然是一样的。
import aiohttp
import asyncio
import time
start_time = time.time()
async def main():
#List of 1000 subdomains , Some subdomains do not exist
data = [ "LIST OF 1000 SUBDOMAINS" ]
async with aiohttp.ClientSession() as session:
for url in data:
pokemon_url = f'{url}'
try:
async with session.get(pokemon_url, ssl=False) as resp:
pokemon = await resp.status
#If subdomain exists then print the status
print(pokemon)
except:
#else print the subdomain which does not exist or cannot be reached
print(url)
asyncio.run(main())
print("--- %s seconds ---" % (time.time() - start_time))
I have tried multiple techniques even grequests
grequests
对此效果很好,如果您不想,则不必使用异步。
import grequests
import time
urls = ['https://httpbin.org/delay/4' for _ in range(4)]
# each of these requests take 4 seconds to complete
# serially, these would take at least 16 (4 * 4) seconds to complete
reqs = [grequests.get(url) for url in urls]
start = time.time()
for resp in grequests.imap(reqs, size=4):
print(resp.status_code)
end = time.time()
print('finished in', round(end-start, 2), 'seconds')
200
200
200
200
finished in 4.32 seconds