将按键分组的序列扩展为未分组的序列列表,并在 Scala 中附加键

Expand a Sequence grouped by keys into a list of ungrouped sequences with keys attached in Scala

我在 Scala 中有以下对象:

List[(String,Map[String, Seq[(Int, Double)]])]

我想将它转换成一个单独的行序列,其中每一行都有 4 个项:(String,String,Int,Double)

例如,如果我有以下数据:

List(
  ("SuperGroup1", Map("SubGroup1" -> Seq((17,24.1),(38,39.2)))),
  ("SuperGroup1", Map("SubGroup2" -> Seq((135,302.3),(938,887.4))))
)

我想把它变成:

Seq(
  ("SuperGroup1","SubGroup1",17,24.1),
  ("SuperGroup1","SubGroup1",38,39.2),
  ("SuperGroup1","SubGroup2",135,302.3),
  ("SuperGroup1","SubGroup2",938,887.4)
)

我想你可以使用 flatMap 或类似的东西,但我不确定它是如何工作的。我看到 RDD 有一个名为 flatMapValues 的函数,但是对于像我这样的标准 list/map 组合呢?

鉴于您拥有的类型和以下输入:

val input = Seq(
  ("SuperGroup1", Map("SubGroup1" -> Seq(((17,24.1),(38,39.2))))),
  ("SuperGroup1", Map("SubGroup2" -> Seq(((135,302.3),(938,887.4)))))
)

这会将您的输入转换为您期望的形式

input.flatMap { superGroupBox =>
  superGroupBox._2.toSeq.flatMap { subGroupBox =>
    subGroupBox._2.flatMap(x => Seq(x._1, x._2).map(numericTuple => (superGroupBox._1, subGroupBox._1, numericTuple._1, numericTuple._2)))
  }
}

给出

val x = List(
  ("SuperGroup1",Map("SubGroup1" -> Seq((17,24.1),(38,39.2)))),
  ("SuperGroup1",Map("SubGroup2" -> Seq((135,302.3),(938,887.4))))
)

您从

获得您想要的列表
for ((s, m) <- x; (k, vs) <- m; (i, f) <- vs) yield (s, k, i, f)

结果:

List(
  (SuperGroup1,SubGroup1,17,24.1),
  (SuperGroup1,SubGroup1,38,39.2),
  (SuperGroup1,SubGroup2,135,302.3),
  (SuperGroup1,SubGroup2,938,887.4)
)