Spark:反加入两个DStreams
Spark : anti-join two DStreams
我可以在两个 Spark DStream 上进行 JOIN,例如:
val joinStream = stream1.join(stream2)
现在,如果我需要过滤掉所有未加入的记录怎么办。本质上,类似于 stream1.anti-join(stream2)
。这有可能吗?
感谢并感谢任何帮助!
假设你有这些:
val rdd1 = sc.parallelize(Array(
(1, "one"),
(2, "twow"),
(3, "three"),
(4, "four"),
(5, "five")
))
val rdd2 = sc.parallelize(Array(
(1, "otherOne"),
(4, "otherFour"),
(5,"otherFive"),
(6,"six"),
(7,"seven")
))
val antiJoined = rdd1.fullOuterJoin(rdd2).filter(r => r._2._1.isEmpty || r._2._2.isEmpty)
antiJoined.collect foreach println
(6,(None,Some(six)))
(2,(Some(twow),None))
(3,(Some(three),None))
(7,(None,Some(seven)))
我可以在两个 Spark DStream 上进行 JOIN,例如:
val joinStream = stream1.join(stream2)
现在,如果我需要过滤掉所有未加入的记录怎么办。本质上,类似于 stream1.anti-join(stream2)
。这有可能吗?
感谢并感谢任何帮助!
假设你有这些:
val rdd1 = sc.parallelize(Array(
(1, "one"),
(2, "twow"),
(3, "three"),
(4, "four"),
(5, "five")
))
val rdd2 = sc.parallelize(Array(
(1, "otherOne"),
(4, "otherFour"),
(5,"otherFive"),
(6,"six"),
(7,"seven")
))
val antiJoined = rdd1.fullOuterJoin(rdd2).filter(r => r._2._1.isEmpty || r._2._2.isEmpty)
antiJoined.collect foreach println
(6,(None,Some(six)))
(2,(Some(twow),None))
(3,(Some(three),None))
(7,(None,Some(seven)))