Akka 如何从 ForkJoinPool 中获益？

How Akka benefits from ForkJoinPool?

Akka docs 声明默认调度程序是 fork-join-executor 因为它 "gives excellent performance in most cases".
我想知道这是为什么？

A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute tasks submitted to the pool and/or created by other active tasks (eventually blocking waiting for work if none exist). This enables (1) efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks), as well as (2) when many small tasks are submitted to the pool from external clients. Especially when setting asyncMode to true in constructors, ForkJoinPools may also be (3) appropriate for use with event-style tasks that are never joined.

起初，我猜想 Akka 不是情况 (1) 的示例，因为我无法弄清楚 Akka 是如何分叉任务的，我的意思是，在许多情况下可以分叉的任务是什么任务？
我将每条消息视为一个独立的任务，这就是为什么我认为 Akka 类似于情况 (2)，其中消息是许多小任务（通过 ! 和 ?）提交给 ForkJoinPool。

下一个问题，虽然与 akka 没有严格关系，但会是，为什么没有分叉和加入的用例（ForkJoinPool 允许工作窃取的主要功能）使用 ForkJoinPool 仍能受益？
来自 Scalability of Fork Join Pool

We noticed that the number of context switches was abnormal, above 70000 per second.
That must be the problem, but what is causing it? Viktor came up with the qualified guess that it must be the task queue of the thread pool executor, since that is shared and the locks in the LinkedBlockingQueue could potentially generate the context switches when there is contention.

但是，如果Akka确实没有使用ForkJoinTasks，所有外部客户端提交的任务都会在共享队列中排队，所以竞争应该和[=14=一样].

所以，我的问题是：

Akka 使用 ForkJoinTasks（案例 (1)）或与案例 (2) 相关？
如果外部客户端提交的所有任务都将被推送到共享队列并且不会发生工作窃取，为什么 ForkJoinPool 在情况 (2) 中是有益的？
"with event-style tasks that are never joined"（案例 3）的示例是什么？

更新

正确答案是 johanandren 的答案，但我想添加一些亮点。

Akka 不使用 fork 和 join 功能，因为 AFAIK 与 Actor 模型，或者至少我们如何实现它，没有真正的用例（来自johanandren 的评论）。
所以我对 Akka 不是案例 (1) 的实例的理解是正确的。
在我原来的回答中我说过所有外部客户端提交的任务都会在共享队列中排队。
这是正确的，但仅适用于 FJP 的先前版本 (jdk7)。在 jdk8 中 single submission queue 被许多 "submission queues" 取代。很好地解释了这一点：

Now, before (IIRC) JDK 7u12, ForkJoinPool had a single global submission queue. When worker threads ran out of local tasks, as well the tasks to steal, they got there and tried to see if external work is available. In this design, there is no advantage against a regular, say, ThreadPoolExecutor backed by ArrayBlockingQueue. [...]
Now, the external submission goes into one of the submission queues. Then, workers that have no work to munch on, can first look into the submission queue associated with a particular worker, and then wander around looking into the submission queues of others. One can call that "work stealing" too.

因此，这可以在不使用 fork join 的情况下实现工作窃取。作为 Doug Lea says

Substantially better throughput when lots of clients submit lots of tasks. (I've measured up to 60X speedups on micro-benchmarks). The idea is to treat external submitters in a similar way as workers -- using randomized queuing and stealing. (This required a big internal refactoring to disassociate work queues and workers.) This also greatly improves throughput when all tasks are async and submitted to the pool rather than forked, which becomes a reasonable way to structure actor frameworks, as well as many plain services that you might otherwise use ThreadPoolExecutor for.

关于FJP还有一个值得一提的奇点取自this comment

4% is indeed not much for FJP. There's still a trade-off you do with FJP which you need to be aware of: FJP keeps threads spinning for a while to be able to handle just-in-time arriving work faster. This ensures good latency in many cases. Especially if your pool is overprovisioned, however, the trade-off is a bit of latency against more power consumption in almost-idle situations.

Akka 中的 FJP 是运行和 asyncMode = true 所以第一个问题是 - 让外部客户端提交 short/small 异步工作负载。每个提交的工作负载要么分派一个 actor 来处理其收件箱中的一条或几条消息，但它也用于执行 Scala Future 操作。

当一个非 ForkJoinTask 被调度到 FJP 上的运行时，它会适应 FJP 并像 ForkJoinTask 一样入队。没有一个任务排队的提交（在早期版本中，也许是 JDK7），有很多，以避免争用，空闲线程可以从其他队列中挑选（窃取）任务，如果是的话空.

请注意，默认情况下，我们目前运行正在使用 Java 8 FJP 的分叉版本，因为我们看到 Java 9 FJP 的吞吐量显着下降来了（它包含相当多的变化）。这是 issue #21910 discussing that if you are interested. Additionally, if you want to play around with benchmarking different pools you can find a few *Pool benchmarks here: https://github.com/akka/akka/tree/master/akka-bench-jmh/src/main/scala/akka/actor

http://letitcrash.com/post/17607272336/scalability-of-fork-join-pool

分叉加入池的可扩展性

Akka 2.0 消息传递吞吐量在多核硬件上的扩展方式比以前的版本更好，这要归功于 Doug Lea 开发的新 fork join 执行器。一项微型基准测试表明吞吐量增加了 1100%！

...

http://cs.oswego.edu/pipermail/concurrency-interest/2012-January/008987.html

...

亮点：

大量客户端时吞吐量显着提高提交大量任务。（我测量了高达 60 倍的加速在微基准测试上）。这个想法是对待外部提交者以与工人类似的方式——使用随机排队和偷。（这需要一个大的内部重构来分离工作队列和工人。）这也大大当所有任务异步并提交时提高吞吐量到池而不是分叉，这成为一个合理的构建 actor 框架的方法，以及许多普通的您可能会使用 ThreadPoolExecutor 的其他服务。

这些改进也减少了对提交可能阻塞的任务。添加的参数 ForkJoinTask 文档提供了一些指导（基本上：如果它们很小（即使数量很多），我们也会喜欢它们并且没有依赖项）。

...

Akka 如何从 ForkJoinPool 中获益？

How Akka benefits from ForkJoinPool?

java

multithreading

threadpool

akka

forkjoinpool

更新