使用对 RDD 在 Graphx 中创建图形

Creating Graph in Graphx using pair RDD

我有一对 RDD,想用它构建一个 GraphX 图。我想要加权边,即如果一条边在 RDD 对中出现 3 次,我希望边权重为 3.

take(1) 从 RDD 看起来是这样的:

res2: Array[(String, String)] = Array((905067378709,905458844980))

有向图的解法

假设您有包含边的后续对​​ RDD:

val data: RDD[(String, String)] = sc.parallelize(
  Seq(
    ("905067378709", "905458844980"),
    ("905067378709", "905458844980"),
    ("905458844980", "905067378709"),
    ("905067378709", "905458844980"),
    ("905458844982", "905458844984"),
    ("905067378709", "905458844984"),
    ("905067378712", "905067378709")))

创建以下 RDD[(VertexId, VertexId)]:

val edgesRDD: RDD[(VertexId, VertexId)] = data.map { case (a, b) => (a.toLong, b.toLong) }

然后用函数 Graph.fromEdgeTuples 创建图表。 此函数允许仅从边元组的 RDD 创建图,为边分配值 1,并自动创建边提到的任何顶点并为它们分配默认值。

val graph = Graph.fromEdgeTuples(edgesRDD, 1)
// to print 
val vert: VertexRDD[Int] = graph.vertices
vert.foreach { println }

val edg: EdgeRDD[Int] = graph.edges
edg.foreach { println }

现在我们可以计算重复边的权重:

val subgraph = graph.partitionBy(PartitionStrategy.CanonicalRandomVertexCut)
.groupEdges((a, b) => a + b)

// To print

val vert2: VertexRDD[Int] = subgraph.vertices
vert2.foreach { println }

val edg2: EdgeRDD[Int] = subgraph.edges
edg2.foreach { println }

结果是:

边(905067378712,905067378709,1)

边(905067378709,905458844984,1)

Edge(905067378709,905458844980,3)边缘出现3次

边(905458844980,905067378709,1)

边(905458844982,905458844984,1)