Grep 语法从句

Grep grammatical clauses

我正在尝试寻找一种方法来从电子书样本中提取语法从句。 输入如下所示:

This is a test my friend, this is just a test; I'm going to do some shopping:`what do you need?`
Nothing, he said.

期望的输出:

This is a test my friend
this is just a test
I'm going to do shopping
what do you need
Nothing
he said

关于如何实现这一目标的任何想法?

非常感谢!

通过管道传输到 tr.

cat input | tr ',' '\n'

你可以像这样使用 gnu-awk:

awk -v RS='[\n.,;:`?]+' -v ORS='\n' '{=} 1' file
This is a test my friend
this is just a test
I'm going to do some shopping
what do you need
Nothing
he said

这很接近:

grep -o '[[:alpha:][:space:]]\+' file

但它将 "I'm" 中的单引号转换为换行符。鉴于您的示例标点符号,这有效:

grep -o '[^,;:`?.]\+' file

这将在标点符号后保留 space。要删除它,请将输出通过管道传输到

| sed 's/^ //'