什么是 Spark 流中的饥饿场景?

What is Starvation scenario in Spark streaming?

著名的spark streaming字数统计例子中,spark配置对象初始化如下:

/* Create a local StreamingContext with two working thread and batch interval of 1 second.
The master requires 2 cores to prevent from a starvation scenario. */

val sparkConf = new SparkConf().
setMaster("local[2]").setAppName("WordCount")

在这里,如果我将 master 从 local[2] 更改为 local 或不设置 Master,我不会得到预期的输出,实际上根本不会进行字数统计。

评论说:

"The master requires 2 cores to prevent from a starvation scenario" that's why they have done setMaster("local[2]").

有人可以解释一下为什么它需要 2 个内核以及什么是饥饿场景吗?

来自documentation

[...] note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).

换句话说,一个线程将用于运行接收器,并且至少需要一个线程来处理接收到的数据。对于集群,分配的核心数必须大于接收者的数量,否则系统无法处理数据。

因此,在本地 运行ning 时,您至少需要 2 个线程,而在使用集群时,至少需要为您的系统分配 2 个内核。


饥饿场景指的是这种类型的问题,一些线程根本无法执行,而另一些线​​程正在执行。

饥饿是众所周知的两个经典问题: