如何获取 currentThread().getName() 作为 (concurrent.futures) 未来结果的一部分?

How to get the currentThread().getName() as part of (concurrent.futures) future result?

我有以下代码,底部有问题:

1  from threading import currentThread
2  import concurrent.futures
3  import urllib.request
4 
5  URLS = ['http://www.foxnews.com/',
6          'http://www.cnn.com/',
7          'http://www.bbc.co.uk/']
8 
9 
10
11 # Retrieve a single page and report the URL and contents
12 def load_url(url, timeout):
13    with urllib.request.urlopen(url, timeout=timeout) as conn:
14
15        print(currentThread().getName(), url)
16        # how do I pass back the thread_name with the "conn.read" back to executor.submit? 
17        return conn.read()
18
19
20 # We can use a with statement to ensure threads are cleaned up promptly
21 with concurrent.futures.ThreadPoolExecutor(max_workers=3, thread_name_prefix='url_thread') as executor:
22    # Start the load operations and mark each future with its URL
23    futures_dict = {executor.submit(load_url, url, 60): url for url in URLS}
24    for future in concurrent.futures.as_completed(futures_dict):
25        url = futures_dict[future]
26        try:
27            data = future.result()
28        except Exception as exc:
29            # print('%r generated an exception: %s' % (url, exc))
30            print(f'Thread Name {currentThread().getName()}: {url} generated an exception {exc}')
31        else:
32            # print('%r page is %d bytes' % (url, len(data)))
33            print(f'Thread Name {currentThread().getName()}: content length of {url} = {len(data)} bytes')

我的输出如下:

url_thread_1 http://www.cnn.com/
url_thread_0 http://www.foxnews.com/
url_thread_2 http://www.bbc.co.uk/
Thread Name MainThread: content length of http://www.foxnews.com/ = 282968 bytes
Thread Name MainThread: content length of http://www.cnn.com/ = 1115357 bytes
Thread Name MainThread: content length of http://www.bbc.co.uk/ = 363642 bytes

问题:

上面显示了executor.submit目标函数load_urlreturnsconn.read()(第17行)。

如果我在 with ThreadPoolExecutor 语句(第 30 和 33 行)中执行 print(threading.currentThread.getName()),它总是显示“MainThread”。

但是当我在目标函数 load_url 中执行 print(threading.currentThread.getName()) 时(参见第 15 行)- 它正确显示 url_thread_0url_thread_1 等。 (请注意,我将 max_workers=3thread_name_prefix='url_thread 作为参数传递给 ThreadPoolExecutor)

如何将第 15 行的 currentThread().getName() 和第 17 行的 conn.read() 作为 futures_dict 键的一部分传递,以便我可以使用它来显示第 30 行或第 33 行中正确的线程名称?

我会将 load_url 函数重构为 return 元组(或者更好的命名元组或对象),如下所示:

from threading import currentThread
import concurrent.futures
import urllib.request

URLS = ['https://www.foxnews.com/',
        'https://www.cnn.com/',
        'https://www.bbc.co.uk/']
 
 
def load_url(url, timeout):
    try:
        with urllib.request.urlopen(url, timeout=timeout) as conn:
            return url, currentThread().getName(), conn.read(), None
    except Exception as exc:
        return url, currentThread().getName(), None, exc


with concurrent.futures.ThreadPoolExecutor(max_workers=3, thread_name_prefix='url_thread') as executor:
    futures = (executor.submit(load_url, url, 60) for url in URLS)
    for future in concurrent.futures.as_completed(futures):
        url, thread_name, data, exc = future.result()

        if exc:
            print(f'Thread Name {thread_name}: {url} generated an exception {exc}')
        else:
            print(f'Thread Name {thread_name}: content length of {url} = {len(data)} bytes')

还将异常处理移到函数中(更好的是指定我们要捕获的异常而不是空白异常)。