在 futex 之前,threads/processes 如何在 Linux 中停放和唤醒?
How are threads/processes parked and woken in Linux, prior to futex?
在 Linux 中存在 futex
系统调用之前,pthreads
等线程库使用了哪些底层系统调用到 block/sleep 线程并随后唤醒那些来自用户空间的线程?
例如,如果一个线程试图获取一个互斥体,用户空间的实现将阻塞该线程(可能在短暂的旋转间隔之后),但我找不到用于此的系统调用(除了 futex
这是一个相对较新的创作)。
Futex 代表 "fast userspace mutex." 它只是对互斥体的抽象,它被认为比传统的互斥体机制更快、更方便,因为它为您实现了等待系统。在 futex() 之前和之后,线程被置于睡眠状态并通过改变它们的进程状态被唤醒。进程状态为:
- 运行 状态
- 睡眠状态
- 不可中断的休眠状态(即阻塞 read() 或 write() 等系统调用
- Defunct/zombie 状态
当线程挂起时,它会进入(可中断)'sleep' 状态。之后,它可以通过 wake_up() 函数唤醒,该函数在内核中对其任务结构进行操作。据我所知,wake_up 是内核函数,而不是系统调用。内核不需要系统调用来唤醒或休眠任务;它(或进程)只是简单地改变任务结构来反映进程的状态。当 Linux 调度程序接下来处理该进程时,它会根据其状态对其进行处理(同样,状态已在上面列出)。
小故事:futex() 为您实现了一个等待系统。没有它,您需要一个可从主线程和休眠线程访问的数据结构,以便唤醒休眠线程。所有这些都是通过用户空间代码完成的。您可能需要从内核获得的唯一东西是互斥量——其细节确实包括锁定机制和互斥量数据结构,但不会固有地唤醒或休眠线程。您要查找的系统调用不存在。本质上,您所谈论的大部分内容都可以从用户空间实现,无需系统调用,只需手动跟踪确定是否以及何时休眠或唤醒线程的数据条件。
在 futex 和当前为 Linux 实现 pthreads 之前,NPTL(需要内核 2.6 和更新版本),还有另外两个线程库 POSIX Thread API Linux: linuxthreads and NGPT (which was based on Gnu Pth. LinuxThreads was the only widely used libpthread for years (and it can still be used in some strange & unmaintained micro-libc to work on 2.4; other micro-libc variants may have own builtin implementation of pthread-like API on top of futex+克隆)。而且Gnu Pth不是线程库,是单进程线程,用户级"thread"切换。
你应该知道有几个 Threading Models 当我们检查内核是否知道部分或全部用户线程时(多少 CPU 内核可以用于向程序添加线程;线程的成本是多少/可以启动多少线程)。模型命名为 M:N
,其中 M 是用户 space 线程数,N 是可由 OS 内核调度的线程数:
- "1:1" ''kernel-level threading'' - 每个用户space 线程都可由OS 内核调度。这是在 Linux 线程、NPTL 和许多现代 OS.
中实现的
- "N:1" ''user-level threading'' - 用户space线程由用户space计划,它们对内核都是不可见的,它只调度一个进程(它可能只使用 1 CPU 个核心)。 Gnu Pth (GNU Portable Threads) 就是一个例子,对于某些计算机体系结构还有许多其他实现。
- "M:N" ''hybrid threading'' - OS 内核可以看到和调度一些实体,但其中可能有更多用户space 线程。有时用户space线程会在内核可见线程之间迁移。
使用 1:1 模型,Unix 中有许多经典的睡眠 mechanisms/APIs,例如 select/poll 和信号以及 IPC APIs. As I remember, Linuxthreads used separate processes for every thread (with fully shared memory) and there was special manager "thread" (process) to emulate some POSIX thread features. Wikipedia says that SIGUSR1/SIGUSR2 were used in Linuxthreads for some internal communication between threads, same says IBM "The synchronization of primitives is achieved by means of signals. For example, threads block until awoken by signals.". Check also the project FAQ http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html#H.4 "With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?"[=25= 的其他变体]
LinuxThreads needs two signals for its internal operation. One is used to suspend and restart threads blocked on mutex, condition or semaphore operations. The other is used for thread cancellation.
On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32 signals available and the kernel reserves all of them but two: SIGUSR1 and SIGUSR2. So, LinuxThreads has no choice but use those two signals.
使用 "N:1" 模型线程可能会调用一些阻塞系统调用并阻塞一切(一些库可能会将一些阻塞系统调用转换为异步,或使用一些 SIGALRM or SIGVTALRM magic);或者它可能会调用一些(非常)特殊的内部线程函数,该函数将通过重写机器状态寄存器来进行用户-space 线程切换(如 linux 内核中的 switch_to,保存 IP/SP和其他 regs,恢复 IP/SP 和其他线程的 regs)。因此,内核不会直接从用户空间唤醒任何用户线程,它只是调度整个进程;和用户 space 调度程序实现线程同步逻辑(或者在没有线程工作时调用 sched_yield
或 select)。
使用 M:N
模型非常复杂...对 NGPT 了解不多...在 POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330 第 5 页
中有一段关于 NGPT
There is a new pthread library under development called NGPT. This library is based on the GNU Pth library, which is an M:1 library. NGPT extends Pth by using multiple Linux tasks, thus creating an M:N library. It attempts to preserve Pth’s pthread compatibility while also using multiple Linux tasks for concurrency, but this effort is hampered by the underlying differences in the Linux threading model. The NGPT library at present uses non-blocking wrappers around blocking system calls to avoid
blocking in the kernel.
一些论文和 posts: POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330, LWN post 关于 NPTL 0.1
The futex system call is used extensively in all synchronization
primitives and other places which need some kind of
synchronization. The futex mechanism is generic enough to support
the standard POSIX synchronization mechanisms with very little
effort. ... Futexes also allow the implementation of inter-process
synchronization primitives, a sorely missed feature in the old
LinuxThreads implementation (Hi jbj!).
5.5 Synchronization Primitives
The implementation of the synchronization primitives such as mutexes, read-write
locks, conditional variables, semaphores, and barriers requires some form of kernel
support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application.
Fortunately some new functionality was added to the kernel to implement all kinds
of synchronization primitives: futexes [Futex]. The underlying principle is simple but
powerful enough to be adaptable to all kinds of uses. Callers can block in the kernel
and be woken either explicitly, as a result of an interrupt, or after a timeout.
在 Linux 中存在 futex
系统调用之前,pthreads
等线程库使用了哪些底层系统调用到 block/sleep 线程并随后唤醒那些来自用户空间的线程?
例如,如果一个线程试图获取一个互斥体,用户空间的实现将阻塞该线程(可能在短暂的旋转间隔之后),但我找不到用于此的系统调用(除了 futex
这是一个相对较新的创作)。
Futex 代表 "fast userspace mutex." 它只是对互斥体的抽象,它被认为比传统的互斥体机制更快、更方便,因为它为您实现了等待系统。在 futex() 之前和之后,线程被置于睡眠状态并通过改变它们的进程状态被唤醒。进程状态为:
- 运行 状态
- 睡眠状态
- 不可中断的休眠状态(即阻塞 read() 或 write() 等系统调用
- Defunct/zombie 状态
当线程挂起时,它会进入(可中断)'sleep' 状态。之后,它可以通过 wake_up() 函数唤醒,该函数在内核中对其任务结构进行操作。据我所知,wake_up 是内核函数,而不是系统调用。内核不需要系统调用来唤醒或休眠任务;它(或进程)只是简单地改变任务结构来反映进程的状态。当 Linux 调度程序接下来处理该进程时,它会根据其状态对其进行处理(同样,状态已在上面列出)。
小故事:futex() 为您实现了一个等待系统。没有它,您需要一个可从主线程和休眠线程访问的数据结构,以便唤醒休眠线程。所有这些都是通过用户空间代码完成的。您可能需要从内核获得的唯一东西是互斥量——其细节确实包括锁定机制和互斥量数据结构,但不会固有地唤醒或休眠线程。您要查找的系统调用不存在。本质上,您所谈论的大部分内容都可以从用户空间实现,无需系统调用,只需手动跟踪确定是否以及何时休眠或唤醒线程的数据条件。
在 futex 和当前为 Linux 实现 pthreads 之前,NPTL(需要内核 2.6 和更新版本),还有另外两个线程库 POSIX Thread API Linux: linuxthreads and NGPT (which was based on Gnu Pth. LinuxThreads was the only widely used libpthread for years (and it can still be used in some strange & unmaintained micro-libc to work on 2.4; other micro-libc variants may have own builtin implementation of pthread-like API on top of futex+克隆)。而且Gnu Pth不是线程库,是单进程线程,用户级"thread"切换。
你应该知道有几个 Threading Models 当我们检查内核是否知道部分或全部用户线程时(多少 CPU 内核可以用于向程序添加线程;线程的成本是多少/可以启动多少线程)。模型命名为 M:N
,其中 M 是用户 space 线程数,N 是可由 OS 内核调度的线程数:
- "1:1" ''kernel-level threading'' - 每个用户space 线程都可由OS 内核调度。这是在 Linux 线程、NPTL 和许多现代 OS. 中实现的
- "N:1" ''user-level threading'' - 用户space线程由用户space计划,它们对内核都是不可见的,它只调度一个进程(它可能只使用 1 CPU 个核心)。 Gnu Pth (GNU Portable Threads) 就是一个例子,对于某些计算机体系结构还有许多其他实现。
- "M:N" ''hybrid threading'' - OS 内核可以看到和调度一些实体,但其中可能有更多用户space 线程。有时用户space线程会在内核可见线程之间迁移。
使用 1:1 模型,Unix 中有许多经典的睡眠 mechanisms/APIs,例如 select/poll 和信号以及 IPC APIs. As I remember, Linuxthreads used separate processes for every thread (with fully shared memory) and there was special manager "thread" (process) to emulate some POSIX thread features. Wikipedia says that SIGUSR1/SIGUSR2 were used in Linuxthreads for some internal communication between threads, same says IBM "The synchronization of primitives is achieved by means of signals. For example, threads block until awoken by signals.". Check also the project FAQ http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html#H.4 "With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?"[=25= 的其他变体]
LinuxThreads needs two signals for its internal operation. One is used to suspend and restart threads blocked on mutex, condition or semaphore operations. The other is used for thread cancellation. On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32 signals available and the kernel reserves all of them but two: SIGUSR1 and SIGUSR2. So, LinuxThreads has no choice but use those two signals.
使用 "N:1" 模型线程可能会调用一些阻塞系统调用并阻塞一切(一些库可能会将一些阻塞系统调用转换为异步,或使用一些 SIGALRM or SIGVTALRM magic);或者它可能会调用一些(非常)特殊的内部线程函数,该函数将通过重写机器状态寄存器来进行用户-space 线程切换(如 linux 内核中的 switch_to,保存 IP/SP和其他 regs,恢复 IP/SP 和其他线程的 regs)。因此,内核不会直接从用户空间唤醒任何用户线程,它只是调度整个进程;和用户 space 调度程序实现线程同步逻辑(或者在没有线程工作时调用 sched_yield
或 select)。
使用 M:N
模型非常复杂...对 NGPT 了解不多...在 POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330 第 5 页
There is a new pthread library under development called NGPT. This library is based on the GNU Pth library, which is an M:1 library. NGPT extends Pth by using multiple Linux tasks, thus creating an M:N library. It attempts to preserve Pth’s pthread compatibility while also using multiple Linux tasks for concurrency, but this effort is hampered by the underlying differences in the Linux threading model. The NGPT library at present uses non-blocking wrappers around blocking system calls to avoid blocking in the kernel.
一些论文和 posts: POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330, LWN post 关于 NPTL 0.1
The futex system call is used extensively in all synchronization primitives and other places which need some kind of synchronization. The futex mechanism is generic enough to support the standard POSIX synchronization mechanisms with very little effort. ... Futexes also allow the implementation of inter-process synchronization primitives, a sorely missed feature in the old LinuxThreads implementation (Hi jbj!).
5.5 Synchronization Primitives The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately some new functionality was added to the kernel to implement all kinds of synchronization primitives: futexes [Futex]. The underlying principle is simple but powerful enough to be adaptable to all kinds of uses. Callers can block in the kernel and be woken either explicitly, as a result of an interrupt, or after a timeout.