Shell 用于在输出文本文件中将句子拆分为新格式的脚本？

Question

我把一本圣经转换成纯文本文件，结果是这样的

$$  Genesis 40:1 It came to pass after these things that the butler and the baker of the king of Egypt ..

$$  Genesis 40:2 And Pharaoh was angry with his two officers, the chief butler and the chief baker.

$$  Genesis 40:3 So he put them in custody in the house of the captain of the guard, in the prison, the ..

我希望能够在文本文件上运行一个 shell 脚本，并运行通过文件输出一个看起来像这样的新文件

$$ Genesis 40:1

It came to pass after these things that the butler and the baker of the king of Egypt ..

$$ Genesis 40:2

And Pharaoh was angry with his two officers, the chief butler and the chief baker.

$$ Genesis 40:3

So he put them in custody in the house of the captain of the guard, in the prison, the ..

我想我需要让它解析每行的前 X 个字符，然后在该点拆分行，但是，我是 shell 脚本编写的新手，似乎无法找出处理文件以完成此操作的最佳方法。

有什么想法吗？

Answer 1

因为你只需要用两个换行符替换数字后面的space，你可以使用这个命令：

sed 's/\([0-9]\) /\n\n/' <textfile >newfile

- 将（第一个）一个数字后跟 space 替换为相同的数字后跟两个 \n。

this worked really well until it got to a line that read “1 John 1:1 something written here” then it split the line in the wrong spot. How can I account for this?

为了说明名称前有数字和 space 的行，我们可以在模式中的最后一位数字之前包含一个字母和所有内容：

sed 's/\([a-z].*[0-9]\) /\n\n/' <textfile >newfile

Shell 用于在输出文本文件中将句子拆分为新格式的脚本？

Shell Script for Splitting Sentences into new format in output text file?

unix

shell

text

split

sentence