aiofiles如何限制并发读/写的数量?

How to limit the number of concurrent read / write with aiofiles?

我的程序会用 aiohttp 同时下载大约 1000 万条数据,然后将数据写入磁盘上大约 4000 个文件。

我使用 aiofiles 库,因为我希望我的程序在 reading/writing 文件时也能做其他事情。

但我担心如果程序尝试同时写入所有 4000 个文件,硬盘无法快速完成所有写入。

是否可以限制 aiofiles(或其他库)的并发写入数? aiofiles 已经这样做了吗?

谢谢。

测试代码:

import aiofiles
import asyncio


async def write_to_disk(fname):
    async with aiofiles.open(fname, "w+") as f:
        await f.write("asdf")


async def main():
    tasks = [asyncio.create_task(write_to_disk("%d.txt" % i)) 
             for i in range(10)]
    await asyncio.gather(*tasks)


asyncio.run(main())

您可以使用asyncio.Semaphore来限制并发任务数。只需强制您的 write_to_disk 函数在写入之前获取信号量:

import aiofiles
import asyncio


async def write_to_disk(fname, sema):
    # Edit to address comment: acquire semaphore after opening file
    async with aiofiles.open(fname, "w+") as f, sema:
        print("Writing", fname)
        await f.write("asdf")
        print("Done writing", fname)


async def main():
    sema = asyncio.Semaphore(3)  # Allow 3 concurrent writers
    tasks = [asyncio.create_task(write_to_disk("%d.txt" % i, sema)) for i in range(10)]
    await asyncio.gather(*tasks)


asyncio.run(main())

请注意 sema = asyncio.Semaphore(3) 行以及 async with 中添加的 sema,

输出:

"""
Writing 1.txt
Writing 0.txt
Writing 2.txt
Done writing 1.txt
Done writing 0.txt
Done writing 2.txt
Writing 3.txt
Writing 4.txt
Writing 5.txt
Done writing 3.txt
Done writing 4.txt
Done writing 5.txt
Writing 6.txt
Writing 7.txt
Writing 8.txt
Done writing 6.txt
Done writing 7.txt
Done writing 8.txt
Writing 9.txt
Done writing 9.txt
"""