使用对 RDD 在 Graphx 中创建图形
Creating Graph in Graphx using pair RDD
我有一对 RDD,想用它构建一个 GraphX 图。我想要加权边,即如果一条边在 RDD 对中出现 3 次,我希望边权重为 3
.
take(1)
从 RDD 看起来是这样的:
res2: Array[(String, String)] = Array((905067378709,905458844980))
有向图的解法
假设您有包含边的后续对 RDD:
val data: RDD[(String, String)] = sc.parallelize(
Seq(
("905067378709", "905458844980"),
("905067378709", "905458844980"),
("905458844980", "905067378709"),
("905067378709", "905458844980"),
("905458844982", "905458844984"),
("905067378709", "905458844984"),
("905067378712", "905067378709")))
创建以下 RDD[(VertexId, VertexId)]:
val edgesRDD: RDD[(VertexId, VertexId)] = data.map { case (a, b) => (a.toLong, b.toLong) }
然后用函数 Graph.fromEdgeTuples 创建图表。
此函数允许仅从边元组的 RDD 创建图,为边分配值 1,并自动创建边提到的任何顶点并为它们分配默认值。
val graph = Graph.fromEdgeTuples(edgesRDD, 1)
// to print
val vert: VertexRDD[Int] = graph.vertices
vert.foreach { println }
val edg: EdgeRDD[Int] = graph.edges
edg.foreach { println }
现在我们可以计算重复边的权重:
val subgraph = graph.partitionBy(PartitionStrategy.CanonicalRandomVertexCut)
.groupEdges((a, b) => a + b)
// To print
val vert2: VertexRDD[Int] = subgraph.vertices
vert2.foreach { println }
val edg2: EdgeRDD[Int] = subgraph.edges
edg2.foreach { println }
结果是:
边(905067378712,905067378709,1)
边(905067378709,905458844984,1)
Edge(905067378709,905458844980,3)边缘出现3次
边(905458844980,905067378709,1)
边(905458844982,905458844984,1)
我有一对 RDD,想用它构建一个 GraphX 图。我想要加权边,即如果一条边在 RDD 对中出现 3 次,我希望边权重为 3
.
take(1)
从 RDD 看起来是这样的:
res2: Array[(String, String)] = Array((905067378709,905458844980))
有向图的解法
假设您有包含边的后续对 RDD:
val data: RDD[(String, String)] = sc.parallelize(
Seq(
("905067378709", "905458844980"),
("905067378709", "905458844980"),
("905458844980", "905067378709"),
("905067378709", "905458844980"),
("905458844982", "905458844984"),
("905067378709", "905458844984"),
("905067378712", "905067378709")))
创建以下 RDD[(VertexId, VertexId)]:
val edgesRDD: RDD[(VertexId, VertexId)] = data.map { case (a, b) => (a.toLong, b.toLong) }
然后用函数 Graph.fromEdgeTuples 创建图表。 此函数允许仅从边元组的 RDD 创建图,为边分配值 1,并自动创建边提到的任何顶点并为它们分配默认值。
val graph = Graph.fromEdgeTuples(edgesRDD, 1)
// to print
val vert: VertexRDD[Int] = graph.vertices
vert.foreach { println }
val edg: EdgeRDD[Int] = graph.edges
edg.foreach { println }
现在我们可以计算重复边的权重:
val subgraph = graph.partitionBy(PartitionStrategy.CanonicalRandomVertexCut)
.groupEdges((a, b) => a + b)
// To print
val vert2: VertexRDD[Int] = subgraph.vertices
vert2.foreach { println }
val edg2: EdgeRDD[Int] = subgraph.edges
edg2.foreach { println }
结果是:
边(905067378712,905067378709,1)
边(905067378709,905458844984,1)
Edge(905067378709,905458844980,3)边缘出现3次
边(905458844980,905067378709,1)
边(905458844982,905458844984,1)