运行 个并发线程 (Python) 是否有最大限制?
Is there a maximum limit on running concurrent threads (Python)?
我正在阅读数百条推文,其中我检查了 URLs。我正在为此任务使用多线程,因为 URL 读取需要超过一秒钟的时间。但是,我不确定在这种情况下我一次可以 运行 并行多少个线程?
import Queue
import threading
q = Queue.Queue()
thread limit = 100
for tweet in tweets[0:threadlimit]:
t = threading.Thread(target=process_tweet, args=(q, tweet))
t.daemon = True
t.start()
for tweet in tweets[0:threadlimit]:
tweet = q.get()
我问这个问题的原因是当我使用 100 的线程限制时它工作正常但是对于 200 的线程限制,它会卡住。
平台:Linux
操作系统总是对线程数量有一些限制,并且每个线程使用一些资源(特别是一些 space,也许是 1 兆字节,用于线程的 call stack). So it is not reasonable to have lots of threads. Details are operating system and computer specific. On Linux, see getrlimit(2) for RLIMIT_STACK
(the default stack size) and RLIMIT_NPROC
(number of processes, actually tasks, including threads, you are permitted to have).. and also pthread_attr_setstacksize(3) & pthread_create(3).
线程通常占用大量资源(请阅读 green threads). You don't want to have many (e.g. thousands, or even a hundred) of them on a laptop or desktop (some supercomputers or costly servers have hundreds of cores with NUMA,然后您可以尝试拥有更多线程)。
另请参阅 C10K problem。
Python 的常见实现使用合理大小的 单个 Global Interpreter Lock so having lots of threads is not effective. I would recommend using a thread pool(可能是可配置的,最多可能是几十个)。
考虑使用 PycURL and probably its MULTI interface (see the documentation of the relevant C API in libcurl). Think in terms of an event loop (and perhaps continuation-passing style).
我正在阅读数百条推文,其中我检查了 URLs。我正在为此任务使用多线程,因为 URL 读取需要超过一秒钟的时间。但是,我不确定在这种情况下我一次可以 运行 并行多少个线程?
import Queue
import threading
q = Queue.Queue()
thread limit = 100
for tweet in tweets[0:threadlimit]:
t = threading.Thread(target=process_tweet, args=(q, tweet))
t.daemon = True
t.start()
for tweet in tweets[0:threadlimit]:
tweet = q.get()
我问这个问题的原因是当我使用 100 的线程限制时它工作正常但是对于 200 的线程限制,它会卡住。
平台:Linux
操作系统总是对线程数量有一些限制,并且每个线程使用一些资源(特别是一些 space,也许是 1 兆字节,用于线程的 call stack). So it is not reasonable to have lots of threads. Details are operating system and computer specific. On Linux, see getrlimit(2) for RLIMIT_STACK
(the default stack size) and RLIMIT_NPROC
(number of processes, actually tasks, including threads, you are permitted to have).. and also pthread_attr_setstacksize(3) & pthread_create(3).
线程通常占用大量资源(请阅读 green threads). You don't want to have many (e.g. thousands, or even a hundred) of them on a laptop or desktop (some supercomputers or costly servers have hundreds of cores with NUMA,然后您可以尝试拥有更多线程)。
另请参阅 C10K problem。
Python 的常见实现使用合理大小的 单个 Global Interpreter Lock so having lots of threads is not effective. I would recommend using a thread pool(可能是可配置的,最多可能是几十个)。
考虑使用 PycURL and probably its MULTI interface (see the documentation of the relevant C API in libcurl). Think in terms of an event loop (and perhaps continuation-passing style).