asyncio wait - 处理结果

asyncio wait - process results as they come

此脚本应获取初始任务(URL)列表并使用 aiohttp 异步发出请求。这部分是正确完成的。问题是,由于 asyncio wait 没有 return 实际结果,只有 done/pending task set,我不知道在哪里以及如何处理结果,以发出更多请求并写入数据到数据库。在这个变体中,我将创建一个新的 task(发出更多请求...)放在第一个里面,这是行不通的。 PS。我使用 wait 是因为我正在阅读的一本书建议使用 wait 来更好地控制已完成和待处理的任务和异常。感谢任何帮助:)

async def fetch_content_2(session, url):
    async with session.get(url) as result:
        res = await result.text()
        try:
            new_link = BeautifulSoup(res, 'lxml').select_one('element on website 2')['href'])
            # ***PROCESS AND WRITE SOME DATA TO DB***
        except:
            pass

async def fetch_content_1(session, url):
    async with session.get(url) as result:
        res = await result.text()
        try:
            link = BeautifulSoup(res, 'lxml').select_one('element on website 1')['href'])
            # ***MAKE ANOTHER ASYNC REQUEST WITH NEW LINK***
            asyncio.create_task(fetch_content_1(session,link))
        except:
            pass

async def main(tasks):
    async with ClientSession() as session:
        pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
        while pending:
            done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)

            # print(f'Done count: {len(done)}')
            # print(f'Pending count: {len(pending)}')

asyncio.run(main([url1, url2, ...]))

            

donependingasyncio.Task 个对象的 set。如果要获取任务的结果或其状态,则必须获取集合的值并调用所需的方法,请选中 (docs)。具体你可以得到调用result方法的结果。

async def main(tasks):
    async with ClientSession() as session:
        pending = [asyncio.create_task(fetch_content_1(session, url)) for url in tasks]
        while pending:
            done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
            res = done.pop().result()
            # do some stuff with the result

查看文档以查看调用result方法和相关方法可能出现的异常。如果任务有内部错误或结果未准备好(在这种情况下不应该发生),则可能会发生异常。