我可以获得DStream中每个RDD的最大键吗?
Can I get the maximal key of every RDD in DStream?
我需要找到每个RDD的最大键,但是当使用reduce()时,我能得到的是整个Dstream中最大的一个。
例如,在这个流中,我想要返回的是(2,"b"),(2,"d"),(3,"f"),但我只能得到( 3,"f") 来自 reduce(max)
我怎样才能得到 (2,"b"),(2,"d"),(3,"f")?
sc = SparkContext(appName="PythonStreamingQueueStream")
ssc = StreamingContext(sc, 1)
stream = ssc.queueStream([sc.parallelize([(1,"a"), (2,"b"),(1,"c"),(2,"d"),
(1,"e"),(3,"f")],3)])
stream.reduce(max).pprint()
ssc.start()
ssc.stop(stopSparkContext=True, stopGraceFully=True)
这个:
stream = ssc.queueStream([sc.parallelize([(1,"a"), (2,"b"),(1,"c"),(2,"d"),
(1,"e"),(3,"f")],3)])
创建一个只有一个批次的流,其中第一个也是唯一一个批次有(最少)3 个分区。我想你想要:
stream = ssc.queueStream([
sc.parallelize([(1,"a"), (2,"b")]),
sc.parallelize([(1,"c"), (2,"d")]),
sc.parallelize([(1,"e"), (3,"f")]),
])
这会给你预期的结果:
stream.reduce(max).pprint()
我需要找到每个RDD的最大键,但是当使用reduce()时,我能得到的是整个Dstream中最大的一个。
例如,在这个流中,我想要返回的是(2,"b"),(2,"d"),(3,"f"),但我只能得到( 3,"f") 来自 reduce(max)
我怎样才能得到 (2,"b"),(2,"d"),(3,"f")?
sc = SparkContext(appName="PythonStreamingQueueStream")
ssc = StreamingContext(sc, 1)
stream = ssc.queueStream([sc.parallelize([(1,"a"), (2,"b"),(1,"c"),(2,"d"),
(1,"e"),(3,"f")],3)])
stream.reduce(max).pprint()
ssc.start()
ssc.stop(stopSparkContext=True, stopGraceFully=True)
这个:
stream = ssc.queueStream([sc.parallelize([(1,"a"), (2,"b"),(1,"c"),(2,"d"),
(1,"e"),(3,"f")],3)])
创建一个只有一个批次的流,其中第一个也是唯一一个批次有(最少)3 个分区。我想你想要:
stream = ssc.queueStream([
sc.parallelize([(1,"a"), (2,"b")]),
sc.parallelize([(1,"c"), (2,"d")]),
sc.parallelize([(1,"e"), (3,"f")]),
])
这会给你预期的结果:
stream.reduce(max).pprint()