Apache Spark 从 String 创建顶点
Apache Spark create vertices from String
给定一个字符串
val s = "My-Spark-App"
如何使用 Spark 按以下方式创建顶点?
"My-", "y-S", "-Sp", "Spa", "par", "ark", "rk-", "k-A", "-AP", "App"
这个问题可以并行化吗?
这只是在字符串上简单滑动的问题:
val n: Int = 3
val vertices: Seq[(VertexId, String)] = s.sliding(n)
.zipWithIndex
.map{case (s, i) => (i.toLong, s)}
.toSeq
sc.parallelize(vertices)
Can that problem be parallelized?
是的,它可以,但如果它是单个字符串,它很可能没有意义。不过,如果你想:
import org.apache.spark.rdd.RDD
val vertices: RDD[(VertexId, String)] = sc.parallelize(s)
.sliding(n)
.zipWithIndex
.map{case (cs, i) => (i, cs.mkString)}
给定一个字符串
val s = "My-Spark-App"
如何使用 Spark 按以下方式创建顶点?
"My-", "y-S", "-Sp", "Spa", "par", "ark", "rk-", "k-A", "-AP", "App"
这个问题可以并行化吗?
这只是在字符串上简单滑动的问题:
val n: Int = 3
val vertices: Seq[(VertexId, String)] = s.sliding(n)
.zipWithIndex
.map{case (s, i) => (i.toLong, s)}
.toSeq
sc.parallelize(vertices)
Can that problem be parallelized?
是的,它可以,但如果它是单个字符串,它很可能没有意义。不过,如果你想:
import org.apache.spark.rdd.RDD
val vertices: RDD[(VertexId, String)] = sc.parallelize(s)
.sliding(n)
.zipWithIndex
.map{case (cs, i) => (i, cs.mkString)}