如何使用 Spark Graphx 从列表创建图形

How to create a graph from a list with Spark Graphx

我在 scala 中有一个列表,如下所示:

  val log = List(
    List("a","b","c"),
    List("a","c","b","h","c"),
    List("a","d","e"),
    List("a","d","e","f","d","e")
  )

我想创建这样的图表:


使用创建这两个数组的方法:

  val vertexName: RDD[(VertexId, (String))] =
    sc.parallelize(Array((1L, ("a")), (2L, ("b")),
                         (3L, ("c")), (4L, ("d")),
                         (5L, ("e")), (6L, ("f")),
                         (7L, ("h"))))

  val edgeName: RDD[Edge[String]] =
    sc.parallelize(Array(Edge(1L, 2L, "1"), Edge(2L, 3L, "1"),
                         Edge(1L, 3L, "1"), Edge(3L, 2L, "1"),
                         Edge(2L, 7L, "1"), Edge(7L, 3L, "1"),
                         Edge(1L, 4L, "1"), Edge(4L, 5L, "1"),
                         Edge(5L, 6L, "1"), Edge(6L, 4L, "1")))

  val graph = Graph(vertexName, edgeName)

可能吗?有办法吗?

我假设您的顶点列表是应该在图中找到的路径。

我将从在顶点名称和它们的 VertexId 之间建立映射开始

val vertices = log.flatMap(x=> x).toSet.toSeq
val vertexMap = (0 until vertices.size)
    .map(i => vertices(i) -> i.toLong)
    .toMap

然后我将使用顶点贴图生成一组边(以避免重复)。

val edgeSet = log
    .filter(_.size >1) // with only one vertex, this is not a path
    .flatMap(list => list.indices.tail.map( i => list(i-1) -> list(i)))
    .map(x => Edge(vertexMap(x._1), vertexMap(x._2), "1"))
    .toSet

并创建图表:

val edges = sc.parallelize(edgeSet.toSeq)
val vertexNames = sc.parallelize(vertexMap.toSeq.map(_.swap))
val graph = Graph(vertexNames, edges)