尝试发出异步请求时 RuntimeError 会话关闭

Question

首先是代码：

import random
import asyncio
from aiohttp import ClientSession
import csv

headers =[]

def extractsites(file):
    sites = []
    readfile = open(file, "r")
    reader = csv.reader(readfile, delimiter=",")
    raw = list(reader)
    for a in raw:
        sites.append((a[1]))
    return sites


async def fetchheaders(url, session):
    async with session.get(url) as response:
        responseheader = await response.headers
        print(responseheader)
        return responseheader


async def bound_fetch(sem, url, session):
    async with sem:
        print("doing request for "+ url)
        await fetchheaders(url, session)


async def run():
    urls = extractsites("cisco-umbrella.csv")
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(100)
    async with ClientSession() as session:
        for i in urls:
            task = asyncio.ensure_future(bound_fetch(sem, "http://"+i, session))
            tasks.append(task)
        return tasks

def main():
    loop = asyncio.get_event_loop()
    future = asyncio.ensure_future(run())
    loop.run_until_complete(future)

if __name__ == '__main__':
    main()

大部分代码取自此博客 post： https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html

这是我面临的问题：我正在尝试从一个文件中读取一百万个 url，然后为每个 url 发出异步请求。但是当我尝试执行上面的代码时，我收到会话过期错误。

这是我的思路：我对异步编程比较陌生，所以请多多包涵。我的虽然过程是创建一个长任务列表（只允许 100 个并行请求），我在 run 函数中构建，然后作为 future 传递给事件循环以执行。

我在 bound_fetch 中包含了一个打印调试（我从博客 post 中复制了它），看起来它会循环遍历我拥有的所有 url，并且一旦它应该开始在 fetchheaders 函数中发出请求时出现运行时错误。

如何修复我的代码？

Answer 1

这里有几件事。

首先，在您的运行函数中，您实际上想在那里收集任务并等待它们解决您的 session 问题，如下所示：

async def run():
    urls = ['google.com','amazon.com']
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(100)
    async with ClientSession() as session:
        for i in urls:
            task = asyncio.ensure_future(bound_fetch(sem, "http://"+i, session))
            tasks.append(task)
        await asyncio.gather(*tasks)

其次，aiohttp API 在处理 headers 时有点奇怪，因为您不能等待它们。我通过等待 body 来解决这个问题，以便填充 headers 然后返回 headers:

async def fetchheaders(url, session):
    async with session.get(url) as response:
        data = await response.read()
        responseheader = response.headers
        print(responseheader)
        return responseheader

然而，拉取 body 时会有一些额外的开销。尽管没有进行 body 读取，但我找不到另一种加载 headers 的方法。

尝试发出异步请求时 RuntimeError 会话关闭

RuntimeError Session is closed when trying to make async requests

python

asynchronous

aiohttp