Python

Question

所以目前我有这段代码，它完美地按照我的预期工作。

import urllib.request
from tqdm import tqdm

with open("output.txt", "r") as file:
    itemIDS = [line.strip() for line in file]

x = 0

for length in tqdm(itemIDS):
    urllib.request.urlretrieve(
        "https://imagemocksite.com?id="+str(itemIDS[x]), 
        "images/"+str(itemIDS[x])+".jpg")
    x += 1

print("All images downloaded")

我四处搜索，发现的解决方案并不是我真正想要的。我有 200mbp/s，所以这不是我的问题。

我的问题是我的循环每秒迭代 1.1 - 1.57 次。我想加快速度，因为我有超过 5000 张图片要下载。它们每个也大约 1-5kb。

此外，如果有人有任何一般的代码提示，我将不胜感激！我正在学习 python，这很有趣，所以我想尽可能地变得更好！

编辑：使用下面关于 asyncio 的信息，我现在得到 1.7-2.1 It/s 更好！可以更快吗？可能是我用错了？

import urllib.request
from tqdm import tqdm
import asyncio

with open("output.txt", "r") as file:
    itemIDS = [line.strip() for line in file]

async def download():
    x = 0
    for length in tqdm(itemIDS):
        await asyncio.sleep(1)
        urllib.request.urlretrieve(
            "https://imagemocksite.com?id="+str(itemIDS[x]), 
            "images/"+str(itemIDS[x])+".jpg")
        x += 1

asyncio.run(download())
print("All images downloaded")

Answer 1

评论已经提供了很好的建议，我认为您使用 asyncio 是正确的，这确实是完成此类工作的典型 Python 工具。

只是想提供一些帮助，因为您提供的代码并没有真正发挥它的作用。

首先，您必须安装 aiohttp and aiofiles 来异步处理 HTTP 请求和本地文件系统 I/O。

然后，定义一个 download(item_id, session) 辅助协程，根据 item_id 下载一个图像。 session 将是一个 aiohttp.ClientSession，它是 class 到运行异步 HTTP 请求的基础 aiohttp。

最终的诀窍是拥有一个 download_all 协程，它一次调用所有单独的 download() 协程的 asyncio.gather。 asyncio.gather 是告诉 asyncio 到运行多个协程“并行”的方法。

这应该会大大加快您的下载速度。如果没有，那就是第三方服务器限制了你。

import asyncio

import aiohttp
import aiofiles


with open("output.txt", "r") as file:
    itemIDS = [line.strip() for line in file]


async def download(item_id, session):
    url = "https://imagemocksite.com"
    filename = f"images/{item_id}.jpg"
    async with session.get(url, {"id": item_id}) as response:
        async with aiofiles.open(filename, "wb") as f:
            await f.write(await response.read())


async def download_all():
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[download(item_id, session) for item_id in itemIDS]
        )


asyncio.run(download_all())
print("All images downloaded")

Python - 遍历大列表并快速下载图像

Python - Looping through large list and downloading images quickly

performance

loops

download