如何在不丢失分割线的情况下用 scalaz-stream 合并相邻线
How to merge adjacent lines with scalaz-stream without losing the splitting line
假设我的输入文件 myInput.txt
如下所示:
~~~ text1
bla bla
some more text
~~~ text2
lorem ipsum
~~~ othertext
the wikipedia
entry is not
up to date
即有文件被~~~
隔开。期望的输出如下:
text1: bla bla some more text
text2: lorem ipsum
othertext: the wikipedia entry is not up to date
我该怎么做?以下看起来很不自然,而且我失去了标题:
val converter: Task[Unit] =
io.linesR("myInput.txt")
.split(line => line.startsWith("~~~"))
.intersperse(Vector("\nNew document: "))
.map(vec => vec.mkString(" "))
.pipe(text.utf8Encode)
.to(io.fileChunkW("flawedOutput.txt"))
.run
converter.run
以下工作正常,但如果我 运行 它不仅仅是一个玩具示例(处理 70MB 约 5 分钟),它会非常慢。那是因为我到处都在创建 Process
吗?而且,它似乎只使用一个核心。
val converter2: Task[Unit] = {
val docSep = "~~~"
io.linesR("myInput.txt")
.flatMap(line => { val words = line.split(" ");
if (words.length==0 || words(0)!=docSep) Process(line)
else Process(docSep, words.tail.mkString(" ")) })
.split(_ == docSep)
.filter(_ != Vector())
.map(lines => lines.head + ": " + lines.tail.mkString(" "))
.intersperse("\n")
.pipe(text.utf8Encode)
.to(io.fileChunkW("correctButSlowOutput.txt"))
.run
}
假设我的输入文件 myInput.txt
如下所示:
~~~ text1
bla bla
some more text
~~~ text2
lorem ipsum
~~~ othertext
the wikipedia
entry is not
up to date
即有文件被~~~
隔开。期望的输出如下:
text1: bla bla some more text
text2: lorem ipsum
othertext: the wikipedia entry is not up to date
我该怎么做?以下看起来很不自然,而且我失去了标题:
val converter: Task[Unit] =
io.linesR("myInput.txt")
.split(line => line.startsWith("~~~"))
.intersperse(Vector("\nNew document: "))
.map(vec => vec.mkString(" "))
.pipe(text.utf8Encode)
.to(io.fileChunkW("flawedOutput.txt"))
.run
converter.run
以下工作正常,但如果我 运行 它不仅仅是一个玩具示例(处理 70MB 约 5 分钟),它会非常慢。那是因为我到处都在创建 Process
吗?而且,它似乎只使用一个核心。
val converter2: Task[Unit] = {
val docSep = "~~~"
io.linesR("myInput.txt")
.flatMap(line => { val words = line.split(" ");
if (words.length==0 || words(0)!=docSep) Process(line)
else Process(docSep, words.tail.mkString(" ")) })
.split(_ == docSep)
.filter(_ != Vector())
.map(lines => lines.head + ": " + lines.tail.mkString(" "))
.intersperse("\n")
.pipe(text.utf8Encode)
.to(io.fileChunkW("correctButSlowOutput.txt"))
.run
}