"spark.streaming.blockInterval" 在 Spark Streaming DirectAPI 中有什么用
What is use of "spark.streaming.blockInterval" in Spark Streaming DirectAPI
我想了解,"spark.streaming.blockInterval"
在 Spark Streaming DirectAPI 中扮演什么角色,根据我的理解 "spark.streaming.blockInterval"
用于计算分区,即 #partitions = (receivers x* batchInterval) /blockInterval
,但在 DirectAPI spark streaming 分区中等于没有。卡夫卡分区。
如何在 DirectAPI 中使用 "spark.streaming.blockInterval"
?
spark.streaming.blockInterval :
Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark.
并且KafkaUtils.createDirectStream()不使用接收器。
With directStream, Spark Streaming will create as many RDD partitions
as there are Kafka partitions to consume
我想了解,"spark.streaming.blockInterval"
在 Spark Streaming DirectAPI 中扮演什么角色,根据我的理解 "spark.streaming.blockInterval"
用于计算分区,即 #partitions = (receivers x* batchInterval) /blockInterval
,但在 DirectAPI spark streaming 分区中等于没有。卡夫卡分区。
如何在 DirectAPI 中使用 "spark.streaming.blockInterval"
?
spark.streaming.blockInterval :
Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark.
并且KafkaUtils.createDirectStream()不使用接收器。
With directStream, Spark Streaming will create as many RDD partitions as there are Kafka partitions to consume