Scala 将图的每个节点的邻接表写入文本文件

Scala Write the adjacency list of each node of a graph to a text file

我正在尝试反转有向图并将每个顶点的邻接列表写入格式为

的文本文件
NodeId \t NeighbourId1,NeighbourId2,...,NeighbourIdn

到目前为止,我只尝试打印输出如下:

(4,[J@13ad83aa)
(0,[J@338ff780)
(1,[J@6737f62b)
(3,[J@1250d788)
(2,[J@6d1fa6bb)

而格式应为:

4   2
0   4
1   0,2
3   1,2,3
2   0,1

我目前使用的代码是

object Problem2{
def main(args: Array[String]){
val inputFile:String = args(0)
val outputFolder = args(1)
val conf = new SparkConf().setAppName("Problem2").setMaster("local")
val sc = new SparkContext(conf)

val graph = GraphLoader.edgeListFile(sc,inputFile)
val edges = graph.reverse.edges
val vertices = graph.vertices
val newGraph = Graph(vertices,edges)

val verticesWithSuccessors: VertexRDD[Array[VertexId]] = 
newGraph.ops.collectNeighborIds(EdgeDirection.Out)

val successorGraph = Graph(verticesWithSuccessors, edges)
val res = successorGraph.vertices.collect()

val adjList = successorGraph.vertices.foreach(println)

我认为 mkString() 不能与图形对象一起使用。图对象有没有类似的方法获取字符串?

让我们再举这个例子:

val vertices: RDD[(VertexId, String)] =  
    sc.parallelize(Array((1L,""), (2L,""), (4L,""), (6L,"")))


val edges: RDD[Edge[String]] = 
    sc.parallelize(Array(
        Edge(1L, 2L, ""),
        Edge(1L, 4L, ""),
        Edge(1L, 6L, "")))
val inputGraph = Graph(vertices, edges)

val verticesWithSuccessors: VertexRDD[Array[VertexId]] = 
    inputGraph.ops.collectNeighborIds(EdgeDirection.Out)
val successorGraph = Graph(verticesWithSuccessors, edges)

一旦你有了这个:

val adjList = successorGraph.vertices

你可以很容易地转化为DataFrame:

val df = adjList.toDF(Seq("node", "adjacents"): _*)
df.show()
+----+---------+
|node|adjacents|
+----+---------+
|   1|[2, 4, 6]|
|   2|       []|
|   4|       []|
|   6|       []|
+----+---------+

现在可以轻松地使用列进行转换。这是一个不太漂亮的例子:

val result = df.rdd.collect().map(l=> l(0).asInstanceOf[Long] + "\t"  + l(1).asInstanceOf[Seq[Long]].mkString(" "))
result.foreach(println(_))

1   2 4 6
2   
4   
6   

或者您也可以尝试使用 UDF 或根据需要处理列。

希望对您有所帮助!