通过在 Scala 中逐行从文件中获取输入来计算字数吗?

Do word count by taking input from file line by line in Scala?

我有一个包含单词的源文件,想要进行典型的单词计数,我正在使用可以转换为数组并存入内存的东西

def freqMap(lines: Iterator[String]): Map[String, Int] = {

   val mappedWords: Array[(String, Int)] = lines.toArray.flatMap((l: String) => l.split(delimiter).map((word: String) => (word, 1)))

   val frequencies = mappedWords.groupBy((e) => e._1).map { case (key, elements) => elements.reduce((x, y) => (y._1, x._2 + y._2)) }

   frequencies
}

但我想逐行评估并在处理每一行时显示输出。这怎么能懒惰地完成而不把所有东西都放入内存

我想你要找的是 scanLeft 方法。所以示例解决方案可能如下所示:

val iter = List("this is line number one", "this is line number two", "this this this").toIterator

  val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
    case (acc, word) =>
      println(word)
      acc.updated(word, acc.getOrElse(word, 0) + 1)
  }

如果你执行 val solution = iter.flatMap(_.split(" ")).scanLeftMap[String, Int]{ 案例(acc,单词)=> 打印(字) acc.updated(单词, acc.getOrElse(单词, 0) + 1) }

println(solution.take(3).toList) 这将打印到控制台:

  val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
  println(word)
  acc.updated(word, acc.getOrElse(word, 0) + 1)

}

this
is
line
number
one
List(Map(), Map(this -> 1), Map(this -> 1, is -> 1), Map(this -> 1, is -> 1, line -> 1), Map(this -> 1, is -> 1, line -> 1, number -> 1))

你说你不想把所有东西都放在内存中,但你想要 "show output as every line is processed." 听起来你只想 println 中间结果。

lines.foldLeft(Map[String,Int]()){ case (mp,line) =>
  println(mp)  // output intermediate results
  line.split(" ").foldLeft(mp){ case (m,word) =>
      m.lift(word).fold(m + (word -> 1))(c => m + (word -> (c+1)))
  }
}

迭代器 (lines) 一次消耗一个。 Map 结果逐字构建,并作为 foldLeft 累加器逐行结转。