多线程未实现性能差异 Python

Multithreading not achieving performance difference Python

下面是一个发出多个 get 请求并将响应图像写入我的目录的程序。这些获取请求应该在单独的线程中,因此比 w/o 线程更快,但我没有看到性能差异。

打印 active_count() 显示创建了 9 个线程。但是,无论我是否使用线程,执行时间仍然需要大约 40 秒。

下面是我使用线程。

from threading import active_count
import requests
import time
import concurrent.futures

img_urls = [
    'https://images.unsplash.com/photo-1516117172878-fd2c41f4a759',
    'https://images.unsplash.com/photo-1532009324734-20a7a5813719',
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
    'https://images.unsplash.com/photo-1549692520-acc6669e2f0c'
]

t1 = time.perf_counter()


def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, img_urls)
    print(active_count())


t2 = time.perf_counter()

print(f'Finished in {t2-t1} seconds')

下面是没有穿线的

def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


for img_url in img_urls:
    download_image(img_url)

有人可以解释为什么会这样吗?谢谢

我可以看到使用多处理包时性能有所提高。

import multiprocessing
from multiprocessing import Pool


def download_image(img_url: str) -> None:
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


if __name__ == '__main__':
    t1 = time.perf_counter()

    with Pool(processes=multiprocessing.cpu_count() - 1 or 1) as pool:
        pool.map(download_image, img_urls)

    t2 = time.perf_counter()

    print(f'Finished in {t2 - t1} seconds')

这是我用你的代码得到的结果,下载旁边有开始和结束时间。总体时间大致相同(在我的“正常网络”上,而不是我在评论中谈到的慢速网络)

原因是多线程没有增加I/O或带宽,也可能是网站本身的限制。这看起来问题不是出在您的代码上。

EDIT(误导性陈述):正如 MisterMiyagi 在下面的评论中提到的(阅读他的评论,他解释了原因),它 应该 增加 I/O,这就是我在慢速网络上增加 10 秒的原因(我工作实验室的连接受限)。在那种特定情况下,这不会增加 I/O 或带宽(在我的“正常”连接上有完整的带宽),这可能来自很多来源,但在我看来,不是代码本身。

我也试过max_workers=5,出现的总时间是一样的。

    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 1.0464828 - 1.7136098
    photo-1532009324734-20a7a5813719.jpg was downloaded... 1.7140197 - 5.6327612
    photo-1524429656589-6633a470097c.jpg was downloaded... 5.6339666 - 8.3146478
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 8.3160157 - 10.474087
    photo-1564135624576-c5c88640f235.jpg was downloaded... 10.4749598 - 11.2431941
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 11.2436369 - 15.6939695
    photo-1522364723953-452d3431c267.jpg was downloaded... 15.6954112 - 18.3257819
    photo-1513938709626-033611b8cc03.jpg was downloaded... 18.3269668 - 21.0607191
    photo-1507143550189-fed454f93097.jpg was downloaded... 21.0621265 - 22.2371699
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 22.2375931 - 26.4375676
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 26.4393404 - 28.3477933
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 28.348679 - 30.4626719
    photo-1516972810927-80185027ca84.jpg was downloaded... 30.4636931 - 32.2621345
    photo-1550439062-609e1531270e.jpg was downloaded... 32.2628976 - 34.7331719
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 34.7341393 - 35.5910094
    Finished in 34.545366900000005 seconds
    21
    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 35.5960486 - 46.1692758
    photo-1564135624576-c5c88640f235.jpg was downloaded... 35.6110777 - 47.3780254
    photo-1507143550189-fed454f93097.jpg was downloaded... 35.6265503 - 47.4433963
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 35.6692061 - 49.7097683
    photo-1516972810927-80185027ca84.jpg was downloaded... 35.6420564 - 57.2326763
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 35.6340008 - 61.4597509
    photo-1550439062-609e1531270e.jpg was downloaded... 35.6637577 - 62.0488296
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 35.6072146 - 63.4139648
    photo-1513938709626-033611b8cc03.jpg was downloaded... 35.6223106 - 63.8149815
    photo-1524429656589-6633a470097c.jpg was downloaded... 35.6032493 - 63.8284464
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 35.6352735 - 65.0513042
    photo-1522364723953-452d3431c267.jpg was downloaded... 35.6182243 - 65.5005548
    photo-1532009324734-20a7a5813719.jpg was downloaded... 35.5994888 - 66.2930857
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 35.6144996 - 67.8115219
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 35.6301133 - 68.5357319
    Finished in 32.946069800000004 seconds

编辑 2(更多测试):我尝试使用我的一个网络服务器(相同的代码,只是不同的图像列表),我的整体减少了 60-70%下载时间。在这种情况下,与有限的工人一起工作效果最好。问题来自网站,而不是您的代码。