线程：尽管我正在使用线程，函数似乎运行作为一个阻塞循环

Question

我正在尝试通过运行在 concurrent.futures 库的 ThreadPoolExecutor 中设置我的 http 请求来加快网络抓取。

代码如下：

import concurrent.futures
import requests
from bs4 import BeautifulSoup


urls = [
        'https://www.interactivebrokers.eu/en/index.php?f=41295&exch=ibfxcfd&showcategories=CFD',
        'https://www.interactivebrokers.eu/en/index.php?f=41634&exch=chix_ca',
        'https://www.interactivebrokers.eu/en/index.php?f=41634&exch=tase',
        'https://www.interactivebrokers.eu/en/index.php?f=41295&exch=chixen-be&showcategories=STK',
        'https://www.interactivebrokers.eu/en/index.php?f=41295&exch=bvme&showcategories=STK'
        ]

def get_url(url):
    print(url)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    a = soup.select_one('a')
    print(a)


with concurrent.futures.ThreadPoolExecutor(max_workers=12) as executor:
    results = {executor.submit( get_url(url)) : url for url in urls}

    for future in concurrent.futures.as_completed(results):
        try:
            pass
        except Exception as exc:
            print('ERROR for symbol:', results[future])
            print(exc)

然而，当查看脚本在 CLI 中的打印方式时，似乎请求是在阻塞循环中发送的。

此外，如果我运行使用下面的代码，我会发现它花费的时间大致相同。

for u in urls:
    get_url(u)

我之前使用该库在实现并发方面取得了一些成功，但我不知道这里出了什么问题。

我知道可以使用 asyncio 库作为替代方案，但我更愿意使用线程。

Answer 1

您实际上运行您的 get_url 呼叫不是任务；您在主线程中调用它们，并将结果传递给 executor.submit，体验与 this problem with raw threading.Thread usage 类似的 concurrent.futures。变化：

results = {executor.submit( get_url(url)) : url for url in urls}

至：

results = {executor.submit(get_url, url) : url for url in urls}

因此您将要调用的函数及其参数传递给 submit 调用（然后在线程中为您运行它们）并且它应该并行化您的代码。

线程：尽管我正在使用线程，函数似乎运行作为一个阻塞循环

threading: function seems to run as a blocking loop although i am using threading

python

multithreading

python-requests

concurrent.futures

线程：尽管我正在使用线程，函数似乎 运行 作为一个阻塞循环

threading: function seems to run as a blocking loop although i am using threading

python

multithreading

python-requests

concurrent.futures

线程：尽管我正在使用线程，函数似乎运行作为一个阻塞循环