通过在 Scala 中逐行从文件中获取输入来计算字数吗?
Do word count by taking input from file line by line in Scala?
我有一个包含单词的源文件,想要进行典型的单词计数,我正在使用可以转换为数组并存入内存的东西
def freqMap(lines: Iterator[String]): Map[String, Int] = {
val mappedWords: Array[(String, Int)] = lines.toArray.flatMap((l: String) => l.split(delimiter).map((word: String) => (word, 1)))
val frequencies = mappedWords.groupBy((e) => e._1).map { case (key, elements) => elements.reduce((x, y) => (y._1, x._2 + y._2)) }
frequencies
}
但我想逐行评估并在处理每一行时显示输出。这怎么能懒惰地完成而不把所有东西都放入内存
我想你要找的是 scanLeft 方法。所以示例解决方案可能如下所示:
val iter = List("this is line number one", "this is line number two", "this this this").toIterator
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
如果你执行 val solution = iter.flatMap(_.split(" ")).scanLeftMap[String, Int]{
案例(acc,单词)=>
打印(字)
acc.updated(单词, acc.getOrElse(单词, 0) + 1)
}
println(solution.take(3).toList)
这将打印到控制台:
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
this
is
line
number
one
List(Map(), Map(this -> 1), Map(this -> 1, is -> 1), Map(this -> 1, is -> 1, line -> 1), Map(this -> 1, is -> 1, line -> 1, number -> 1))
你说你不想把所有东西都放在内存中,但你想要 "show output as every line is processed." 听起来你只想 println
中间结果。
lines.foldLeft(Map[String,Int]()){ case (mp,line) =>
println(mp) // output intermediate results
line.split(" ").foldLeft(mp){ case (m,word) =>
m.lift(word).fold(m + (word -> 1))(c => m + (word -> (c+1)))
}
}
迭代器 (lines
) 一次消耗一个。 Map
结果逐字构建,并作为 foldLeft
累加器逐行结转。
我有一个包含单词的源文件,想要进行典型的单词计数,我正在使用可以转换为数组并存入内存的东西
def freqMap(lines: Iterator[String]): Map[String, Int] = {
val mappedWords: Array[(String, Int)] = lines.toArray.flatMap((l: String) => l.split(delimiter).map((word: String) => (word, 1)))
val frequencies = mappedWords.groupBy((e) => e._1).map { case (key, elements) => elements.reduce((x, y) => (y._1, x._2 + y._2)) }
frequencies
}
但我想逐行评估并在处理每一行时显示输出。这怎么能懒惰地完成而不把所有东西都放入内存
我想你要找的是 scanLeft 方法。所以示例解决方案可能如下所示:
val iter = List("this is line number one", "this is line number two", "this this this").toIterator
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
如果你执行 val solution = iter.flatMap(_.split(" ")).scanLeftMap[String, Int]{ 案例(acc,单词)=> 打印(字) acc.updated(单词, acc.getOrElse(单词, 0) + 1) }
println(solution.take(3).toList)
这将打印到控制台:
val solution = iter.flatMap(_.split(" ")).scanLeft[Map[String, Int]](Map.empty){
case (acc, word) =>
println(word)
acc.updated(word, acc.getOrElse(word, 0) + 1)
}
this
is
line
number
one
List(Map(), Map(this -> 1), Map(this -> 1, is -> 1), Map(this -> 1, is -> 1, line -> 1), Map(this -> 1, is -> 1, line -> 1, number -> 1))
你说你不想把所有东西都放在内存中,但你想要 "show output as every line is processed." 听起来你只想 println
中间结果。
lines.foldLeft(Map[String,Int]()){ case (mp,line) =>
println(mp) // output intermediate results
line.split(" ").foldLeft(mp){ case (m,word) =>
m.lift(word).fold(m + (word -> 1))(c => m + (word -> (c+1)))
}
}
迭代器 (lines
) 一次消耗一个。 Map
结果逐字构建,并作为 foldLeft
累加器逐行结转。