Spark Scala 理解 reduceByKey(_ + _)

Question

我无法理解第一个 spark with scala 示例中的 reduceByKey(_ + _)

object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _)  **I cant't understand this line**
wordCounts.saveAsTextFile(outputPath)
}
}

Answer 1

Reduce 采用两个元素并在对两个参数应用函数后生成第三个元素。

您显示的代码等同于以下内容

 reduceByKey((x,y)=> x + y)

Scala 没有定义虚拟变量并编写 lambda，而是足够聪明地弄清楚您尝试实现的是对它接收的任何两个参数应用 func（在本例中为求和），因此语法

 reduceByKey(_ + _)

Answer 2

reduceByKey 有两个参数，应用函数和 returns

reduceByKey(_ + _) 等价于 reduceByKey((x,y)=> x + y)

示例：

val numbers = Array(1, 2, 3, 4, 5)
val sum = numbers.reduceLeft[Int](_+_)

println("The sum of the numbers one through five is " + sum)

结果：

The sum of the numbers one through five is 15
numbers: Array[Int] = Array(1, 2, 3, 4, 5)
sum: Int = 15

同样的reduceByKey(_++_)等价于reduceByKey((x,y)=>x++y)

Spark Scala 理解 reduceByKey(_ + _)

Spark Scala Understanding reduceByKey(_ + _)

scala

bigdata

word-count

apache-spark