什么是 Spark 流中的饥饿场景？

Question

著名的spark streaming字数统计例子中，spark配置对象初始化如下：

/* Create a local StreamingContext with two working thread and batch interval of 1 second.
The master requires 2 cores to prevent from a starvation scenario. */

val sparkConf = new SparkConf().
setMaster("local[2]").setAppName("WordCount")

在这里，如果我将 master 从 local[2] 更改为 local 或不设置 Master，我不会得到预期的输出，实际上根本不会进行字数统计。

评论说：

"The master requires 2 cores to prevent from a starvation scenario" that's why they have done setMaster("local[2]").

有人可以解释一下为什么它需要 2 个内核以及什么是饥饿场景吗？

Answer 1

来自documentation：

[...] note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).

换句话说，一个线程将用于运行接收器，并且至少需要一个线程来处理接收到的数据。对于集群，分配的核心数必须大于接收者的数量，否则系统无法处理数据。

因此，在本地运行ning 时，您至少需要 2 个线程，而在使用集群时，至少需要为您的系统分配 2 个内核。

饥饿场景指的是这种类型的问题，一些线程根本无法执行，而另一些线程正在执行。

饥饿是众所周知的两个经典问题：

Dining philosophers
Readers-writer problem，这里可以同步线程，让读者或作者挨饿。也可以确保不发生饥饿。

什么是 Spark 流中的饥饿场景？

What is Starvation scenario in Spark streaming?

parallel-processing

scala

apache-spark

spark-streaming

starvation