如何使用scala对spark中rdd的每一行进行排序?
How to sort each line of a rdd in spark using scala?
我的文本文件有以下数据:
10,14,16,19,52
08,09,12,20,45
55,56,70,78,53
我想按降序对每一行进行排序。我试过下面的代码
val file = sc.textFile("Maximum values").map(x=>x.split(","))
val sorted = file.sortBy(x=> -x(2).toInt)
sorted.collect()
我得到以下输出
[[55, 56, 70, 78, 53], [10, 14, 16, 19, 52], [08, 09, 12, 20, 45]]
以上结果表明整个列表已按降序排序order.But我希望按降序对每个值进行排序
例如
[10,14,16,19,52],[08,09,12,20,45],[55,56,70,78,53]
应该是
[52,19,16,14,10],[45,20,12,09,08],[78,70,56,55,53]
请抽空提前回答this.Thanks。
这是一种方法(未经测试)
val reverseStringOrdering = Ordering[String].reverse
val file = sc.textFile("Maximum values").map(x=>x.split(",").sorted(reverseStringOrdering))
val sorted = file.sortBy(r => r, ascending = false)
sorted.collect()
Spark SQL 方式,
import org.apache.spark.sql.functions._
val df = Seq(
("10","14","16","19","52"),
("08","09","12","20","45"),
("55","56","70","78","53")).toDF("C1", "C2","C3","C4","C5")
df.withColumn("sortedCol", sort_array(array("C1", "C2","C3","C4","C5"), false))
.select("sortedCol")
.show()
输出
+--------------------+
| sortedCol|
+--------------------+
|[52, 19, 16, 14, 10]|
|[45, 20, 12, 09, 08]|
|[78, 70, 56, 55, 53]|
+--------------------+
检查这个。
val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(','); y.sorted.reverse.mkString(",") } )
file.collect.foreach(println)
编辑 1:
不同的方法如何应用于上述代码。
scala> val a = "10,14,16,19,52"
a: String = 10,14,16,19,52
scala> val b = a.split(',')
b: Array[String] = Array(10, 14, 16, 19, 52)
scala> b.sorted
res0: Array[String] = Array(10, 14, 16, 19, 52)
scala> b.sorted.reverse
res1: Array[String] = Array(52, 19, 16, 14, 10)
scala> b.sorted.reverse.mkString(",")
res2: String = 52,19,16,14,10
scala> b.sorted.reverse.mkString("*")
res3: String = 52*19*16*14*10
scala>
编辑2:
val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(',').map(_.toInt); y.sorted.reverse.mkString(",") } )
file.collect.foreach(println)
我的文本文件有以下数据:
10,14,16,19,52
08,09,12,20,45
55,56,70,78,53
我想按降序对每一行进行排序。我试过下面的代码
val file = sc.textFile("Maximum values").map(x=>x.split(","))
val sorted = file.sortBy(x=> -x(2).toInt)
sorted.collect()
我得到以下输出
[[55, 56, 70, 78, 53], [10, 14, 16, 19, 52], [08, 09, 12, 20, 45]]
以上结果表明整个列表已按降序排序order.But我希望按降序对每个值进行排序
例如
[10,14,16,19,52],[08,09,12,20,45],[55,56,70,78,53]
应该是
[52,19,16,14,10],[45,20,12,09,08],[78,70,56,55,53]
请抽空提前回答this.Thanks。
这是一种方法(未经测试)
val reverseStringOrdering = Ordering[String].reverse
val file = sc.textFile("Maximum values").map(x=>x.split(",").sorted(reverseStringOrdering))
val sorted = file.sortBy(r => r, ascending = false)
sorted.collect()
Spark SQL 方式,
import org.apache.spark.sql.functions._
val df = Seq(
("10","14","16","19","52"),
("08","09","12","20","45"),
("55","56","70","78","53")).toDF("C1", "C2","C3","C4","C5")
df.withColumn("sortedCol", sort_array(array("C1", "C2","C3","C4","C5"), false))
.select("sortedCol")
.show()
输出
+--------------------+
| sortedCol|
+--------------------+
|[52, 19, 16, 14, 10]|
|[45, 20, 12, 09, 08]|
|[78, 70, 56, 55, 53]|
+--------------------+
检查这个。
val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(','); y.sorted.reverse.mkString(",") } )
file.collect.foreach(println)
编辑 1: 不同的方法如何应用于上述代码。
scala> val a = "10,14,16,19,52"
a: String = 10,14,16,19,52
scala> val b = a.split(',')
b: Array[String] = Array(10, 14, 16, 19, 52)
scala> b.sorted
res0: Array[String] = Array(10, 14, 16, 19, 52)
scala> b.sorted.reverse
res1: Array[String] = Array(52, 19, 16, 14, 10)
scala> b.sorted.reverse.mkString(",")
res2: String = 52,19,16,14,10
scala> b.sorted.reverse.mkString("*")
res3: String = 52*19*16*14*10
scala>
编辑2:
val file = spark.sparkContext.textFile("in/sort.dat").map( x=> { val y = x.split(',').map(_.toInt); y.sorted.reverse.mkString(",") } )
file.collect.foreach(println)