Shell 用于在输出文本文件中将句子拆分为新格式的脚本?
Shell Script for Splitting Sentences into new format in output text file?
我把一本圣经转换成纯文本文件,结果是这样的
$$ Genesis 40:1 It came to pass after these things that the butler and the baker of the king of Egypt ..
$$ Genesis 40:2 And Pharaoh was angry with his two officers, the chief butler and the chief baker.
$$ Genesis 40:3 So he put them in custody in the house of the captain of the guard, in the prison, the ..
我希望能够在文本文件上 运行 一个 shell 脚本,并运行 通过文件输出一个看起来像这样的新文件
$$ Genesis 40:1
It came to pass after these things that the butler and the baker of
the king of Egypt ..
$$ Genesis 40:2
And Pharaoh was angry with his two officers, the chief butler and the
chief baker.
$$ Genesis 40:3
So he put them in custody in the house of the captain of the guard, in
the prison, the ..
我想我需要让它解析每行的前 X 个字符,然后在该点拆分行,但是,
我是 shell 脚本编写的新手,似乎无法找出处理文件以完成此操作的最佳方法。
有什么想法吗?
因为你只需要用两个换行符替换数字后面的space,你可以使用这个命令:
sed 's/\([0-9]\) /\n\n/' <textfile >newfile
- 将(第一个)一个数字后跟 space 替换为相同的数字后跟两个 \n
。
this worked really well until it got to a line that read “1 John 1:1 something written here” then it split the line in the wrong spot. How can I account for this?
为了说明名称前有数字和 space 的行,我们可以在模式中的最后一位数字之前包含一个字母和所有内容:
sed 's/\([a-z].*[0-9]\) /\n\n/' <textfile >newfile
我把一本圣经转换成纯文本文件,结果是这样的
$$ Genesis 40:1 It came to pass after these things that the butler and the baker of the king of Egypt ..
$$ Genesis 40:2 And Pharaoh was angry with his two officers, the chief butler and the chief baker.
$$ Genesis 40:3 So he put them in custody in the house of the captain of the guard, in the prison, the ..
我希望能够在文本文件上 运行 一个 shell 脚本,并运行 通过文件输出一个看起来像这样的新文件
$$ Genesis 40:1
It came to pass after these things that the butler and the baker of the king of Egypt ..
$$ Genesis 40:2
And Pharaoh was angry with his two officers, the chief butler and the chief baker.
$$ Genesis 40:3
So he put them in custody in the house of the captain of the guard, in the prison, the ..
我想我需要让它解析每行的前 X 个字符,然后在该点拆分行,但是, 我是 shell 脚本编写的新手,似乎无法找出处理文件以完成此操作的最佳方法。
有什么想法吗?
因为你只需要用两个换行符替换数字后面的space,你可以使用这个命令:
sed 's/\([0-9]\) /\n\n/' <textfile >newfile
- 将(第一个)一个数字后跟 space 替换为相同的数字后跟两个 \n
。
this worked really well until it got to a line that read “1 John 1:1 something written here” then it split the line in the wrong spot. How can I account for this?
为了说明名称前有数字和 space 的行,我们可以在模式中的最后一位数字之前包含一个字母和所有内容:
sed 's/\([a-z].*[0-9]\) /\n\n/' <textfile >newfile