当 reduceByKey 起作用时会发生什么？

Question

我的 mapToPair 函数产生以下输出。

(一, 1) （一，一） (b, 1)

我正在使用 reduceByKey 函数减少值，代码如下：

  private static final Function2<Integer, Integer, Integer> WORDS_REDUCER =
      new Function2<Integer, Integer, Integer>() {
        public Integer call(Integer a, Integer b) throws Exception {
          return a + b;
        }
      };

它工作正常，有人能解释一下这段代码在执行 (b, 1) 对时是如何工作的吗？

Answer 1

我不太清楚你不明白的问题是什么，但也许这会有所帮助...

reduceByKey 函数 x+y 充当累加器，对每个键的值求和。如果特定键只有一个值，则该值将是求和结果。

下面是一个使用 PySpark 的例子：

  testrdd = sc.parallelize((('a', 1), ('a', 1), ('b', 1)))
  testrdd = testrdd.reduceByKey(lambda x,y:x+y)
  result = testrdd.collect()
  print ("result: {}".format(result))

  >result: [('a', 2), ('b', 1)]

当 reduceByKey 起作用时会发生什么？

what happens when reduceByKey in action?

apache-spark