如何获取 Spark Graphx 中的公共边数?
How can I get the number of common edges in Spark Graphx?
例如,如果我有两个顶点和边像这样的图:
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val vertexRdd1: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
(1L, ("a", 28)),
(2L, ("b", 27)),
(3L, ("c", 65))
))
val edgeRdd1: RDD[Edge[Int]] = sc.parallelize(Array(
Edge(1L, 2L, 1),
Edge(2L, 3L, 8)
))
val vertexRdd2: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
(1L, ("a", 28)),
(2L, ("b", 27)),
(3L, ("c", 28)),
(4L, ("d", 27)),
(5L, ("e", 65))
))
val edgeRdd2: RDD[Edge[Int]] = sc.parallelize(Array(
Edge(1L, 2L, 1),
Edge(2L, 3L, 4),
Edge(3L, 5L, 1),
Edge(2L, 4L, 1)
))
如何在不考虑边属性的情况下获得这两个图之间的公共边数?因此,在上面的示例中,公共边的数量为 2,公共边为:Edge(1L, 2L, 1) common with Edge(1L, 2L, 1) 和 Edge(2L, 3L, 8) common with Edge( 2L, 3L, 4).
我正在用 scala 编程。
假设您有 graph1
(Graph(vertexRdd1, edgeRdd1)
) 和 graph2
(Graph(vertexRdd2, edgeRdd2))
),您可以将边映射到 (srcId, dstId)
,然后使用 intersection
方法:
val srcDst1 = graph1.edges.map(e => (e.srcId, e.dstId))
val srcDst2 = graph2.edges.map(e => (e.srcId, e.dstId))
srcDst1.intersection(srcDst2).count()
例如,如果我有两个顶点和边像这样的图:
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val vertexRdd1: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
(1L, ("a", 28)),
(2L, ("b", 27)),
(3L, ("c", 65))
))
val edgeRdd1: RDD[Edge[Int]] = sc.parallelize(Array(
Edge(1L, 2L, 1),
Edge(2L, 3L, 8)
))
val vertexRdd2: RDD[(VertexId, (String, Int))] = sc.parallelize(Array(
(1L, ("a", 28)),
(2L, ("b", 27)),
(3L, ("c", 28)),
(4L, ("d", 27)),
(5L, ("e", 65))
))
val edgeRdd2: RDD[Edge[Int]] = sc.parallelize(Array(
Edge(1L, 2L, 1),
Edge(2L, 3L, 4),
Edge(3L, 5L, 1),
Edge(2L, 4L, 1)
))
如何在不考虑边属性的情况下获得这两个图之间的公共边数?因此,在上面的示例中,公共边的数量为 2,公共边为:Edge(1L, 2L, 1) common with Edge(1L, 2L, 1) 和 Edge(2L, 3L, 8) common with Edge( 2L, 3L, 4).
我正在用 scala 编程。
假设您有 graph1
(Graph(vertexRdd1, edgeRdd1)
) 和 graph2
(Graph(vertexRdd2, edgeRdd2))
),您可以将边映射到 (srcId, dstId)
,然后使用 intersection
方法:
val srcDst1 = graph1.edges.map(e => (e.srcId, e.dstId))
val srcDst2 = graph2.edges.map(e => (e.srcId, e.dstId))
srcDst1.intersection(srcDst2).count()