使用 Akka Streams 结合定界符从文本文件创建单词流
Combine delimiters to create stream of words from text file using Akka Streams
我有下一个计算文本文件中词频的代码:
implicit val system: ActorSystem = ActorSystem("words-count")
implicit val mat = ActorMaterializer()
implicit val ec: ExecutionContextExecutor = system.dispatcher
val sink = Sink.fold[Map[String, Int], String](Map.empty)({
case (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
})
FileIO.fromPath(Paths.get("/file.txt"))
.via(Framing.delimiter(ByteString(" "), 256, true).map(_.utf8String))
.toMat(sink)((_, right) => right)
.run()
.map(println(_))
.onComplete(_ => system.terminate())
目前,它使用 space 作为分隔符但忽略换行符 ("\n"
)。我可以在同一个流中同时使用 space 和换行符作为分隔符吗,即有没有办法将它们组合起来?
您可以将分隔符设置为 \n
,然后使用 flatMapConcat
:
按 space 拆分行
FileIO
.fromPath(Paths.get("file.txt"))
.via(Framing.delimiter(ByteString("\n"), 256, true).map(_.utf8String))
.flatMapConcat(s => Source(s.split(" ").toList)) //split line by space
.toMat(sink)((_, right) => right)
.run()
.map(println(_))
.onComplete(_ => system.terminate())
我有下一个计算文本文件中词频的代码:
implicit val system: ActorSystem = ActorSystem("words-count")
implicit val mat = ActorMaterializer()
implicit val ec: ExecutionContextExecutor = system.dispatcher
val sink = Sink.fold[Map[String, Int], String](Map.empty)({
case (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
})
FileIO.fromPath(Paths.get("/file.txt"))
.via(Framing.delimiter(ByteString(" "), 256, true).map(_.utf8String))
.toMat(sink)((_, right) => right)
.run()
.map(println(_))
.onComplete(_ => system.terminate())
目前,它使用 space 作为分隔符但忽略换行符 ("\n"
)。我可以在同一个流中同时使用 space 和换行符作为分隔符吗,即有没有办法将它们组合起来?
您可以将分隔符设置为 \n
,然后使用 flatMapConcat
:
FileIO
.fromPath(Paths.get("file.txt"))
.via(Framing.delimiter(ByteString("\n"), 256, true).map(_.utf8String))
.flatMapConcat(s => Source(s.split(" ").toList)) //split line by space
.toMat(sink)((_, right) => right)
.run()
.map(println(_))
.onComplete(_ => system.terminate())