DStream 的笛卡尔

Cartesian of DStream

我使用 Spark 笛卡尔函数生成 N 对值的列表。

然后我映射这些值以生成每个用户之间的距离度量:

val cartesianUsers: org.apache.spark.rdd.RDD[(distance.classes.User, distance.classes.User)] = users.cartesian(users)
cartesianUsers.map(m => manDistance(m._1, m._2))

这按预期工作。

我使用 Spark Streaming 库创建了一个 DStream,然后在其上映射:

val customReceiverStream: ReceiverInputDStream[String] = ssc.receiverStream....
customReceiverStream.foreachRDD(m => {
  println("size is " + m)
})

我可以在 customReceiverStream.foreachRDD 中使用笛卡尔函数,但根据文档 http://spark.apache.org/docs/1.2.0/streaming-programming-guide.htm 这不是它的预期用途:

foreachRDD(func) 应用函数的最通用的输出运算符,func, to each RDD generated from the stream. This function should push the data in each RDD to a external system, like saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.

如何计算 DStream 的笛卡尔坐标?也许我误解了 DStreams 的使用?

我不知道转换方法:

cartesianUsers.transform(car => car.cartesian(car))

不错的谈话,其中还提到了大约 17:00 https://www.youtube.com/watch?v=g171ndOHgJ0

处的变换函数