映射和减少数组

map and reduce an array

我想每3个元素映射一个数组,输出很多[k,v]对,例如:

input: array(1,2,3,4,5,6,7,8,9,7,12,11)   
output: (1 => 2,3) (4 => 5,6)(7 => 8,9) (7 => 12,11)   

而且我还想通过key来减少这些对,例如,如果我想收集key=7的数据,那么输出将是(7=> 8,9,12,11).

非常感谢。

试试这个:

 res0 = list.grouped(3).map {x => (x(0), List(x(1),x(2)))}.toList
 // you must dump your converted data format into your storage eg hdfs. 
 // And not the entire thing in the form of array. Transform in form of 
 // (key,value) and dump in hdfs. That will save a lot of computation.

 res1 = sc.parallelize(res0)
 res2 = res1.reduceByKey(_++_).collect

但我不确定该解决方案的可扩展性。

编辑

val res1 = sc.parallelize(arr)
// (1,2,3,4,5,6,7,8,9,7,12,11) 
val res2 = res1.zipWithIndex.map(x._2/3,List(x._1))
//  (1,0),(2,1),...(12,10),(11,11) -> (0,1),(0,2),(0,3),(1,4),(1,5),(1,6)
val res3 = res2.reduceByKey(_++_).map(_._2)
//(0,List(1,2,3)),(1,List(4,5,6)) -> List(1,2,3),List(4,5,6)
val res4 = res3.map(x => x match {
   case x1::xs => (x1,xs)
}).reduceByKey(_++_)

//List(1,2,3) - > (1,List(2,3)) -> reduceByKey
//(1,List(2,3)),(4,List(5,6)),(7,List(8,9,12,11))

我觉得你需要的是关注

val input = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 7, 12, 11)
val output = input.toSeq.grouped(3)
  .map(g => (g.head, g.tail)).toList
  .groupBy(_._1)
  .mapValues(l => l.flatMap(_._2))

结果会是

Map(4 -> List(5, 6), 7 -> List(8, 9, 12, 11), 1 -> List(2, 3))