Spark 隐式 RDD 转换不起作用
Spark implicit RDD conversion doesn't work
我遇到了与 Spark sorting of delimited data 类似的问题,但公认的解决方案并没有解决我的问题。
我正在尝试在简单的 RDD 上应用 combineByKey:
package foo
import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
object HelloTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Test")
val sc = new SparkContext(sparkConf)
val input = sc.textFile("/path/to/test.txt")
val result = input.combineByKey(
(v) => (v, 1),
(acc: (Int, Int), v) => (acc._1 + v, acc._2 + 1),
(acc1: (Int, Int), acc2: (Int, Int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2)
).map{ case (key, value) => (key, value._1 / value._2.toFloat) }
result.collectAsMap().map(println(_))
sc.stop()
}
}
我在编译时遇到(唯一的)以下错误:
$ scalac -cp /path/to/scala-2.10/spark-assembly-1.4.0-SNAPSHOT-hadoop2.2.0.jar -sourcepath src/ -d bin src/foo/HelloTest.scala
error: value combineByKey is not a member of org.apache.spark.rdd.RDD[String]
有趣的是这里没有描述 combineByKey 函数:https://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs but is, in the working with k/v pairs section of the learning spark book.
所以问题似乎是您的输入未键入。当您从文本文件中读取输入时,它是字符串的 RDD,对于 combineByKey 或任何类似的函数,它需要是键值对的 RDD。希望有所帮助,很高兴看到 Learning Spark reader :)
我遇到了与 Spark sorting of delimited data 类似的问题,但公认的解决方案并没有解决我的问题。
我正在尝试在简单的 RDD 上应用 combineByKey:
package foo
import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
object HelloTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Test")
val sc = new SparkContext(sparkConf)
val input = sc.textFile("/path/to/test.txt")
val result = input.combineByKey(
(v) => (v, 1),
(acc: (Int, Int), v) => (acc._1 + v, acc._2 + 1),
(acc1: (Int, Int), acc2: (Int, Int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2)
).map{ case (key, value) => (key, value._1 / value._2.toFloat) }
result.collectAsMap().map(println(_))
sc.stop()
}
}
我在编译时遇到(唯一的)以下错误:
$ scalac -cp /path/to/scala-2.10/spark-assembly-1.4.0-SNAPSHOT-hadoop2.2.0.jar -sourcepath src/ -d bin src/foo/HelloTest.scala
error: value combineByKey is not a member of org.apache.spark.rdd.RDD[String]
有趣的是这里没有描述 combineByKey 函数:https://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs but is, in the working with k/v pairs section of the learning spark book.
所以问题似乎是您的输入未键入。当您从文本文件中读取输入时,它是字符串的 RDD,对于 combineByKey 或任何类似的函数,它需要是键值对的 RDD。希望有所帮助,很高兴看到 Learning Spark reader :)