Shell 用于在输出文本文件中将句子拆分为新格式的脚本?

Shell Script for Splitting Sentences into new format in output text file?

我把一本圣经转换成纯文本文件,结果是这样的

$$  Genesis 40:1 It came to pass after these things that the butler and the baker of the king of Egypt ..

$$  Genesis 40:2 And Pharaoh was angry with his two officers, the chief butler and the chief baker.

$$  Genesis 40:3 So he put them in custody in the house of the captain of the guard, in the prison, the ..

我希望能够在文本文件上 运行 一个 shell 脚本,并运行 通过文件输出一个看起来像这样的新文件

$$ Genesis 40:1

It came to pass after these things that the butler and the baker of the king of Egypt ..

$$ Genesis 40:2

And Pharaoh was angry with his two officers, the chief butler and the chief baker.

$$ Genesis 40:3

So he put them in custody in the house of the captain of the guard, in the prison, the ..

我想我需要让它解析每行的前 X 个字符,然后在该点拆分行,但是, 我是 shell 脚本编写的新手,似乎无法找出处理文件以完成此操作的最佳方法。

有什么想法吗?

因为你只需要用两个换行符替换数字后面的space,你可以使用这个命令:

sed 's/\([0-9]\) /\n\n/' <textfile >newfile

- 将(第一个)一个数字后跟 space 替换为相同的数字后跟两个 \n

this worked really well until it got to a line that read “1 John 1:1 something written here” then it split the line in the wrong spot. How can I account for this?

为了说明名称前有数字和 space 的行,我们可以在模式中的最后一位数字之前包含一个字母和所有内容:

sed 's/\([a-z].*[0-9]\) /\n\n/' <textfile >newfile