Spark:反加入两个DStreams

Spark : anti-join two DStreams

我可以在两个 Spark DStream 上进行 JOIN,例如:

val joinStream = stream1.join(stream2)

现在,如果我需要过滤掉所有未加入的记录怎么办。本质上,类似于 stream1.anti-join(stream2)。这有可能吗?

感谢并感谢任何帮助!

假设你有这些:

val rdd1 = sc.parallelize(Array(
  (1, "one"),
  (2, "twow"),
  (3, "three"),
  (4, "four"),
  (5, "five")
))
val rdd2 = sc.parallelize(Array(
  (1, "otherOne"),
  (4, "otherFour"),
  (5,"otherFive"),
  (6,"six"),
  (7,"seven")
))

val antiJoined = rdd1.fullOuterJoin(rdd2).filter(r => r._2._1.isEmpty || r._2._2.isEmpty)

antiJoined.collect foreach println
(6,(None,Some(six)))
(2,(Some(twow),None))
(3,(Some(three),None))
(7,(None,Some(seven)))