如何捕获 aiohttp get 请求中的错误?

How can I catch an error in aiohttp get request?

假设我有一些 URL 列表可以从中获取一些数据。我尝试以异步方式执行此操作,但其中一个 URL 不正确。我怎样才能捕捉到这个错误,是否可以在捕捉到错误后更改 URL 地址以再次发送请求? 我使用以下代码使用 asyncioaiohttp 获取数据:

import asyncio
import aiohttp

urls = ["a", "b", "c"] # some list of urls
results = []

def get_tasks(session):
    tasks = []
    for url in urls:
        tasks.append(asyncio.create_task(session.get(url, ssl=False)))
    return tasks


async def get_symbols():
    async with aiohttp.ClientSession() as session:
        tasks = get_tasks(session)
        responses = await asyncio.gather(*tasks)
        for response in responses:
            results.append(await response.json())


asyncio.run(get_symbols())

然后我得到下一个错误:

ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: ', url=URL('b')

我怎样才能捕捉到这个错误以继续整个过程,是否可以将“b”修复为其他正确的 URL(比如说“bb”)并再次发送请求?

最简单的方法是在 await response.json() 周围放置 try...except 块,如果它抛出异常,则更改 URL 并重新安排它。对于更复杂的任务,例如使用 asyncio.Queue.

import asyncio
import aiohttp

urls = [
    "https://reqbin.com/echo/get/json?1",
    "https://reqbin.com/echo/get/json?2",
    "https://reqbin.com/echo/get/json-BAD",
]
results = []


def get_tasks(session, urls):
    tasks = []
    for url in urls:
        tasks.append(asyncio.create_task(session.get(url, ssl=False)))
    return tasks


async def get_symbols():
    async with aiohttp.ClientSession() as session:
        while urls:
            for task in asyncio.as_completed(get_tasks(session, urls)):
                response = await task
                urls.remove(str(response.url))
                try:
                    data = await response.json()
                    print(response.url, data)
                    results.append(data)
                except Exception as e:
                    new_url = str(response.url).split("-")[0]
                    print(
                        f"Error with URL {response.url} Attempting new URL {new_url}"
                    )
                    urls.append(new_url)


asyncio.run(get_symbols())

打印:

https://reqbin.com/echo/get/json?2 {'success': 'true'}
https://reqbin.com/echo/get/json?1 {'success': 'true'}
Error with URL https://reqbin.com/echo/get/json-BAD Attempting new URL https://reqbin.com/echo/get/json
https://reqbin.com/echo/get/json {'success': 'true'}