Spark error: missing parameter type in map()

Question

我正在尝试通过复制 code here 在 Windows 10 上学习 Spark GraphX。该代码是使用旧版本的 Spark 开发的，我无法找到创建顶点的解决方案。以下是代码

import scala.util.MurmurHash
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

val path = "F:/Soft/spark/2008.csv"
val df_1 = spark.read.option("header", true).csv(path)

val flightsFromTo = df_1.select($"Origin",$"Dest")
val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString))

// error caused by the following line
val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))

错误信息如下：

<console>:57: error: missing parameter type
       val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))
                                                                                  ^

我认为该语法已过时，我试图在 official documents but it was of no help. The data set can be downloaded from here 上找到最新的语法。

更新：

基本上，我正在尝试创建顶点和边，以最终创建如 tutorial 中所示的图形。我也是 Map-Reduce 范式的新手。

Answer 1

你可以试试： val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x(0)), x(1)))

Answer 2

以下代码行对我有用。

// imported latest library - works without this too, just gives a warning
import scala.util.hashing.MurmurHash3

// datasets are set to rdd - this is the cause of the error
val flightsFromTo = df_1.select($"Origin",$"Dest").rdd
val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString)).rdd

val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash3.stringHash(x), x))

Answer 3

// 为了应用map()，只需尝试将变量转换为RDD。

val airportVertices: RDD[(VertexId, String)] = airportCodes.rdd.distinct().map(x => (MurmurHash3.stringHash(x), x))

Spark error: missing parameter type in map()

Spark error: missing parameter type in map()

scala

apache-spark

spark-graphx