用户级线程的好处

Benefits of user-level threads

我在看用户级线程和内核级线程的区别，基本明白了。
我不清楚的是实现用户级线程的意义所在。

如果内核不知道单个进程中存在多个线程，那么我能体验到哪些好处？
我读过几篇文章，指出只有在此类线程不执行阻塞操作（这会导致整个进程阻塞）的情况下，才建议用户级线程实现。

话虽这么说，考虑到它们不能利用多处理器和独立调度，顺序执行所有线程和 "parallel" 执行它们有什么区别？

对之前提出的问题（类似于我的）的回答是这样的：

No modern operating system actually maps n user-level threads to 1 kernel-level thread.

但出于某种原因，Internet 上的许多人表示用户级线程永远不会利用多个处理器。

你能帮我理解一下吗？

强烈推荐Modern Operating Systems 4^th Edition by Andrew S. Tanenbaum (starring in shows such as the debate about Linux; also participating: Linus Torvalds）。花费一大笔钱，但如果你真的想了解一些东西，那绝对是值得的。对于渴望的学生和绝望的爱好者来说，这太棒了。

您的问题已得到解答

[...] what's not clear to me is the point of implementing User-level threads at all.

阅读我的post。我敢说它很全面。

If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?

阅读下面的 "Disadvantages" 部分。

I have read a couple of articles that stated that user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).

阅读 "Disadvantages."

中的 "No coordination with system calls" 小节

所有引文均来自我在此答案顶部推荐的书，第 2.2.4 章，"Implementing Threads in User Space."

优势

在没有线程的系统上启用线程

第一个优点是 user-level 线程是一种在没有线程的系统上使用线程的方式。

The first, and most obvious, advantage is that a user-level threads package can be implemented on an operating system that does not support threads. All operating systems used to fall into this category, and even now some still do.

不需要内核交互

另一个好处是切换线程时的开销很小，而不是切换到内核模式，做一些事情，切换回来等等。更轻的线程切换在书中是这样描述的：

When a thread does something that may cause it to become blocked locally, for example, waiting for another thread in its process to complete some work, it calls a run-time system procedure. This procedure checks to see if the thread must be put into blocked state. If, so it stores the thread’s registers (i.e., its own) [...] and reloads the machine registers with the new thread’s saved values. As soon as the stack pointer and program counter have been switched, the new thread comes to life again automatically. If the machine happens to have an instruction to store all the registers and another one to load them all, the entire thread switch can be done in just a handful of in- structions. Doing thread switching like this is at least an order of magnitude—maybe more—faster than trapping to the kernel and is a strong argument in favor of user-level threads packages.

这种效率也很好，因为它使我们免于繁重的上下文切换和所有其他事情。

单独调整调度算法

此外，因此没有中央调度算法，每个进程都可以有自己的调度算法，并且在选择的多样性上更加灵活。此外，"private" 调度算法在从线程获取的信息方面更加灵活。 信息的数量可以手动调整per-process，所以很finely-grained。这又是因为没有中央调度算法需要适配每个过程的需要；它必须非常通用，而且必须在每种情况下都能提供足够的性能。 User-level 线程允许极其专业的调度算法。
这仅受劣势限制"No automatic switching to the scheduler."

They [user-level threads] allow each process to have its own customized scheduling algorithm. For some applications, for example, those with a garbage-collector thread, not having to worry about a thread being stopped at an inconvenient moment is a plus. They also scale better, since kernel threads invariably require some table space and stack space in the kernel, which can be a problem if there are a very large number of threads.

缺点

与系统调用没有协调

user-level 调度算法不知道是否某个线程调用了阻塞 read 系统调用。 OTOH，kernel-level 调度算法会知道，因为它可以通过系统调用通知；都属于内核代码库。

Suppose that a thread reads from the keyboard before any keys have been hit. Letting the thread actually make the system call is unacceptable, since this will stop all the threads. One of the main goals of having threads in the first place was to allow each one to use blocking calls, but to prevent one blocked thread from affecting the others. With blocking system calls, it is hard to see how this goal can be achieved readily.

他继续说可以进行系统调用 non-blocking 但这将非常不方便，并且与现有 OSes 的兼容性将受到严重损害。
Tanenbaum 先生还说，可以修改围绕系统调用的库包装器（例如在 glibc 中找到的），以预测系统调用何时阻塞使用 select，但他说这是不雅的。

在此基础上，他说线程确实经常阻塞。通常阻塞需要很多系统调用。许多系统调用不好。没有阻塞，线程变得没那么有用了：

For applications that are essentially entirely CPU bound and rarely block, what is the point of having threads at all? No one would seriously propose computing the first n prime numbers or playing chess using threads because there is nothing to be gained by doing it that way.

页面错误块 per-process 如果不知道线程

OS没有线程的概念。因此，如果发生页面错误，整个进程将被阻塞，有效阻塞所有 user-level 个线程。

Somewhat analogous to the problem of blocking system calls is the problem of page faults. [...] If the program calls or jumps to an instruction that is not in memory, a page fault occurs and the operating system will go and get the missing instruction (and its neighbors) from disk. [...] The process is blocked while the necessary instruction is being located and read in. If a thread causes a page fault, the kernel, unaware of even the existence of threads, naturally blocks the entire process until the disk I/O is complete, even though other threads might be runnable.

我认为这可以推广到所有中断。

没有自动切换到调度程序

由于没有per-process时钟中断，一个线程永远获取CPU，除非某些OS-dependent机制（例如上下文切换）发生或它主动释放CPU.
这会阻止常用的调度算法工作，包括 Round-Robin algorithm.

[...] if a thread starts running, no other thread in that process will ever run unless the first thread voluntarily gives up the CPU. Within a single process, there are no clock interrupts, making it impossible to schedule processes round-robin fashion (taking turns). Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.

他说可能的解决方案是

[...] to have the run-time system request a clock signal (interrupt) once a second to give it control, but this, too, is crude and messy to program.

我什至会继续说这样的 "request" 需要一些系统调用才能发生，其缺点已经在 "No coordination with system calls." 中解释过如果没有系统调用那么程序将需要免费访问计时器，这是一个安全漏洞，在现代 OSes.

中是不可接受的

What's not clear to me is the point of implementing user-level threads at all.

User-level 线程在很大程度上成为主流是由于 Ada 及其对线程的要求（Ada 术语中的 tasks）。当时，多处理器系统很少，大多数多处理器都是 master/slave 种类。内核线程根本不存在。必须创建用户线程来实现像 Ada 这样的语言。

If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?

如果您有内核线程，则单个进程中的多个线程可以同时运行。在用户线程中，线程始终交错执行。

使用线程可以简化某些类型的编程。

I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).

这在 Unix 上是正确的，也许不是所有的 Unix 实现。许多操作系统上的用户线程在阻塞 I/O 的情况下运行得非常好。

This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?

在用户线程中。永远不会并行执行。在内核线程中，如果有多个处理器，则可以并行执行。在单处理器系统上，与单线程相比，使用内核线程并没有太多优势（相反：注意 Unix 和用户线程上的阻塞 I/O 问题）。

But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.

在用户线程中，进程通过在自身内部交错执行来管理自己的 "threads"。该进程在进程运行所在的处理器中只能有一个线程运行。

如果操作系统提供系统服务以将代码安排到运行不同的处理器上，则用户线程可以运行在多个处理器上。

最后我要说的是，出于实用目的，用户线程与内核线程相比没有任何优势。有些人会断言存在性能优势，但要有这样的优势，它将取决于系统。