什么是 Spark 流中的饥饿场景?
What is Starvation scenario in Spark streaming?
著名的spark streaming字数统计例子中,spark配置对象初始化如下:
/* Create a local StreamingContext with two working thread and batch interval of 1 second.
The master requires 2 cores to prevent from a starvation scenario. */
val sparkConf = new SparkConf().
setMaster("local[2]").setAppName("WordCount")
在这里,如果我将 master 从 local[2]
更改为 local
或不设置 Master,我不会得到预期的输出,实际上根本不会进行字数统计。
评论说:
"The master requires 2 cores to prevent from a starvation scenario" that's why they have done setMaster("local[2]").
有人可以解释一下为什么它需要 2 个内核以及什么是饥饿场景吗?
[...] note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).
换句话说,一个线程将用于运行接收器,并且至少需要一个线程来处理接收到的数据。对于集群,分配的核心数必须大于接收者的数量,否则系统无法处理数据。
因此,在本地 运行ning 时,您至少需要 2 个线程,而在使用集群时,至少需要为您的系统分配 2 个内核。
饥饿场景指的是这种类型的问题,一些线程根本无法执行,而另一些线程正在执行。
饥饿是众所周知的两个经典问题:
- Dining philosophers
- Readers-writer problem,这里可以同步线程,让读者或作者挨饿。也可以确保不发生饥饿。
著名的spark streaming字数统计例子中,spark配置对象初始化如下:
/* Create a local StreamingContext with two working thread and batch interval of 1 second.
The master requires 2 cores to prevent from a starvation scenario. */
val sparkConf = new SparkConf().
setMaster("local[2]").setAppName("WordCount")
在这里,如果我将 master 从 local[2]
更改为 local
或不设置 Master,我不会得到预期的输出,实际上根本不会进行字数统计。
评论说:
"The master requires 2 cores to prevent from a starvation scenario" that's why they have done setMaster("local[2]").
有人可以解释一下为什么它需要 2 个内核以及什么是饥饿场景吗?
[...] note that a Spark worker/executor is a long-running task, hence it occupies one of the cores allocated to the Spark Streaming application. Therefore, it is important to remember that a Spark Streaming application needs to be allocated enough cores (or threads, if running locally) to process the received data, as well as to run the receiver(s).
换句话说,一个线程将用于运行接收器,并且至少需要一个线程来处理接收到的数据。对于集群,分配的核心数必须大于接收者的数量,否则系统无法处理数据。
因此,在本地 运行ning 时,您至少需要 2 个线程,而在使用集群时,至少需要为您的系统分配 2 个内核。
饥饿场景指的是这种类型的问题,一些线程根本无法执行,而另一些线程正在执行。
饥饿是众所周知的两个经典问题:
- Dining philosophers
- Readers-writer problem,这里可以同步线程,让读者或作者挨饿。也可以确保不发生饥饿。