映射和减少数组
map and reduce an array
我想每3个元素映射一个数组,输出很多[k,v]对,例如:
input: array(1,2,3,4,5,6,7,8,9,7,12,11)
output: (1 => 2,3) (4 => 5,6)(7 => 8,9) (7 => 12,11)
而且我还想通过key来减少这些对,例如,如果我想收集key=7的数据,那么输出将是(7=> 8,9,12,11).
非常感谢。
试试这个:
res0 = list.grouped(3).map {x => (x(0), List(x(1),x(2)))}.toList
// you must dump your converted data format into your storage eg hdfs.
// And not the entire thing in the form of array. Transform in form of
// (key,value) and dump in hdfs. That will save a lot of computation.
res1 = sc.parallelize(res0)
res2 = res1.reduceByKey(_++_).collect
但我不确定该解决方案的可扩展性。
编辑
val res1 = sc.parallelize(arr)
// (1,2,3,4,5,6,7,8,9,7,12,11)
val res2 = res1.zipWithIndex.map(x._2/3,List(x._1))
// (1,0),(2,1),...(12,10),(11,11) -> (0,1),(0,2),(0,3),(1,4),(1,5),(1,6)
val res3 = res2.reduceByKey(_++_).map(_._2)
//(0,List(1,2,3)),(1,List(4,5,6)) -> List(1,2,3),List(4,5,6)
val res4 = res3.map(x => x match {
case x1::xs => (x1,xs)
}).reduceByKey(_++_)
//List(1,2,3) - > (1,List(2,3)) -> reduceByKey
//(1,List(2,3)),(4,List(5,6)),(7,List(8,9,12,11))
我觉得你需要的是关注
val input = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 7, 12, 11)
val output = input.toSeq.grouped(3)
.map(g => (g.head, g.tail)).toList
.groupBy(_._1)
.mapValues(l => l.flatMap(_._2))
结果会是
Map(4 -> List(5, 6), 7 -> List(8, 9, 12, 11), 1 -> List(2, 3))
我想每3个元素映射一个数组,输出很多[k,v]对,例如:
input: array(1,2,3,4,5,6,7,8,9,7,12,11)
output: (1 => 2,3) (4 => 5,6)(7 => 8,9) (7 => 12,11)
而且我还想通过key来减少这些对,例如,如果我想收集key=7的数据,那么输出将是(7=> 8,9,12,11).
非常感谢。
试试这个:
res0 = list.grouped(3).map {x => (x(0), List(x(1),x(2)))}.toList
// you must dump your converted data format into your storage eg hdfs.
// And not the entire thing in the form of array. Transform in form of
// (key,value) and dump in hdfs. That will save a lot of computation.
res1 = sc.parallelize(res0)
res2 = res1.reduceByKey(_++_).collect
但我不确定该解决方案的可扩展性。
编辑
val res1 = sc.parallelize(arr)
// (1,2,3,4,5,6,7,8,9,7,12,11)
val res2 = res1.zipWithIndex.map(x._2/3,List(x._1))
// (1,0),(2,1),...(12,10),(11,11) -> (0,1),(0,2),(0,3),(1,4),(1,5),(1,6)
val res3 = res2.reduceByKey(_++_).map(_._2)
//(0,List(1,2,3)),(1,List(4,5,6)) -> List(1,2,3),List(4,5,6)
val res4 = res3.map(x => x match {
case x1::xs => (x1,xs)
}).reduceByKey(_++_)
//List(1,2,3) - > (1,List(2,3)) -> reduceByKey
//(1,List(2,3)),(4,List(5,6)),(7,List(8,9,12,11))
我觉得你需要的是关注
val input = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 7, 12, 11)
val output = input.toSeq.grouped(3)
.map(g => (g.head, g.tail)).toList
.groupBy(_._1)
.mapValues(l => l.flatMap(_._2))
结果会是
Map(4 -> List(5, 6), 7 -> List(8, 9, 12, 11), 1 -> List(2, 3))