python 中异步循环内的异步循环
Asynchronous loop within an asynchronous loop in python
我正在开发一个网络抓取机器人,它需要非常快速地return它的所有信息。我的主要 class Whole
生成一个查询对象列表。这是我的查询 class:
class Query: #each query has search term and thing(s) to check the commonality of.
def __init__(self, query, terms):
assert type(query)==str
self.query = query
self.terms = terms
self.response = None
def visible(self,element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match(r'<!--.*-->', str(element.encode('utf-8'))):
return False
return True
def processResponse(self, loop):
self.texts = None
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def bound_fetch(sem, url, session):
# Getter function with semaphore.
async with sem:
await fetch(url, session)
async def run(pages):
tasks = []
sem = asyncio.Semaphore(100)
# Fetch all responses within one Client session,
# keep connection alive for all requests.
async with aiohttp.ClientSession() as session:
for page in pages:
task = asyncio.ensure_future(bound_fetch(sem,page, session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
self.texts = responses
# you now have all response bodies in this variable
pages = list([item['link'] for item in self.response['items']]) #all of the links to search
future = asyncio.ensure_future(run(pages))
每个 "query" 都有一个要搜索的页面列表和一个要在这些页面上扫描的单词列表。 Whole
class 包含多个 Query
对象的列表。我想同时执行所有 Query
的所有必要请求,并将响应 returned 到每个单独的查询对象以进行进一步解析。我尝试创建两个事件循环,一个在 Whole
中,另一个在 Query
中,但后来我意识到我不能有一个以上的事件循环。如何创建一个函数来异步执行多个 Query
的所有搜索?在此先感谢您的帮助!
How can I create a function that executes all of the searches of multiple Query
s asynchronously?
将 processResponse
更改为 async def
并将其最后一行替换为 await run(pages)
。然后在 Whole
中等待 asyncio.gather(*[q.processResponse() for q in queries])
就像在 processResponse
.
中一样
我正在开发一个网络抓取机器人,它需要非常快速地return它的所有信息。我的主要 class Whole
生成一个查询对象列表。这是我的查询 class:
class Query: #each query has search term and thing(s) to check the commonality of.
def __init__(self, query, terms):
assert type(query)==str
self.query = query
self.terms = terms
self.response = None
def visible(self,element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match(r'<!--.*-->', str(element.encode('utf-8'))):
return False
return True
def processResponse(self, loop):
self.texts = None
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def bound_fetch(sem, url, session):
# Getter function with semaphore.
async with sem:
await fetch(url, session)
async def run(pages):
tasks = []
sem = asyncio.Semaphore(100)
# Fetch all responses within one Client session,
# keep connection alive for all requests.
async with aiohttp.ClientSession() as session:
for page in pages:
task = asyncio.ensure_future(bound_fetch(sem,page, session))
tasks.append(task)
responses = await asyncio.gather(*tasks)
self.texts = responses
# you now have all response bodies in this variable
pages = list([item['link'] for item in self.response['items']]) #all of the links to search
future = asyncio.ensure_future(run(pages))
每个 "query" 都有一个要搜索的页面列表和一个要在这些页面上扫描的单词列表。 Whole
class 包含多个 Query
对象的列表。我想同时执行所有 Query
的所有必要请求,并将响应 returned 到每个单独的查询对象以进行进一步解析。我尝试创建两个事件循环,一个在 Whole
中,另一个在 Query
中,但后来我意识到我不能有一个以上的事件循环。如何创建一个函数来异步执行多个 Query
的所有搜索?在此先感谢您的帮助!
How can I create a function that executes all of the searches of multiple
Query
s asynchronously?
将 processResponse
更改为 async def
并将其最后一行替换为 await run(pages)
。然后在 Whole
中等待 asyncio.gather(*[q.processResponse() for q in queries])
就像在 processResponse
.