顶点 RDD 上的类型不匹配

Type mismatch on vertex RDD

一个 GraphX 顶点可以存储多少个属性(属性 key:value 对)?

  val vertexArray = Array(
    (1L, ("Name", "Alice"), ("age", 28), ("major", "ECE")),
    (2L, ("Name", "John"), ("age", 23), ("major", "History")),
    (3L, ("Name", "Mark"), ("age", 34), ("major", "Education"))
  )

  val edgeArray = Array(
    Edge(1L, 3L, "cousin"),
    Edge(1L, 2L, "spouse")
  )
  val vertexRDD = sc.parallelize(vertexArray)
  val edgeRDD = sc.parallelize(edgeArray)

  val graph = Graph(vertexRDD, edgeRDD)

上面的代码在创建图表时出错。

Error:(28, 21) type mismatch;
 found   : org.apache.spark.rdd.RDD[(Long, (String, String), (String, Int), (String, String))]
 required: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId, ?)]
    (which expands to)  org.apache.spark.rdd.RDD[(Long, ?)]
Error occurred in an application involving default arguments.
  val graph = Graph(vertexRDD, edgeRDD)
                    ^

此外,vertexId 是否必须始终为 Long,或者 graphX 是否也支持 String vertexId(如果我想使用 java UUID)?

如错误所示,vertexRDD 需要是 RDD[(VertexId, ?)] 类型 - 换句话说,它必须是 Tuple2RDD,其中第一个元素必须类型为 VertexId。在您的示例中,您创建了 Tuple4RDD,这是无效的。为了使其有效,将最后三个元素包装在 Tuple3 中,如下所示:

 val vertexArray = Array(
  (1L, (("Name", "Alice"), ("age", 28), ("major", "ECE"))),
  (2L, (("Name", "John"), ("age", 23), ("major", "History"))),
  (3L, (("Name", "Mark"), ("age", 34), ("major", "Education"))))

然后回答你的第二个问题,那么是的,VertexId 必须是 Long :)

您需要传递第三个参数"defaultVertexAttr"

val graph = Graph(vertexRDD, edgeRDD, (("Name", ""), ("age", 0), ("major", "")))