将包含多个单词的单行拆分为多行,每行包含 x 个单词

Split single line with multiple words into many lines with x words on each

我有一个很大的文本文件,其中只有 1 行。它看起来像这样:

blaalibababla.ru text text text text what's the weather like tooday? blaazzabla.zu some_text blabewdwefla.au it is important not to be afraid of sed blabkrlqbla.ru wjenfkn lkwnef lkwnefl blarthrthbla.net 1234 e12edq 42wsdfg blablabla.com this should finally end

我需要一种方法让它看起来像这样:

blaalibababla.ru text text text text what's the weather like tooday?
blaazzabla.zu some_text
blabewdwefla.au it is important not to be afraid of sed
blabkrlqbla.ru wjenfkn lkwnef lkwnefl
blarthrthbla.net 1234 e12edq 42wsdfg 
blablabla.com this should finally end

我知道如何使用单个域名和 sed:

sed -i 's/blablabla.ru/\n&/g' file.txt

"But not with the additional text afterwards." - 这不是我的意思。

如果sed不是最好的方法,请告诉我。

更新: 这是我的文本文件:

wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes   zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32   tech-lor notify-by-telegram CRITICAL - load average: 42.91,   49.91, 53.88   glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes    

我需要它看起来像:

wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes   
zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32   tech-lor notify-by-telegram CRITICAL - load average: 42.91,   49.91, 53.88   
glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes    

使用 xargs 一次处理 n 条记录的更简单方法,在您的情况下只是 2

xargs -n2 <file
blablabla.ru some_text
blablabla.zu some_text
blablabla.au some_text
blablabla.ru some_text
blablabla.net some_text
blablabla.com some_text

根据 man xargs 页面的 -n 标志所在的位置,

-n max-args, --max-args=max-args
      Use at most max-args arguments per command line.  Fewer than max-args arguments 
      will be used if the size (see the -s option) is exceeded, unless the
      -x option is given, in which case xargs will exit.

要替换回原始文件,请执行

xargs -n2 <file >tmpfile; mv tmpfile file

Awk:

$ awk 'gsub(/([^ ]+ ){2}/,"&\n")' file
blablabla.ru some_text 
blablabla.zu some_text 
blablabla.au some_text 
blablabla.ru some_text 
blablabla.net some_text 
blablabla.com some_text

解释:

每两次重复 [^ ]+(非 space 和 space 的字符串)替换为自身 (&) 和换行符 \n。如果最后有剩余(即 non-match),它不会被打印(除非你用 {}1 包裹 gsub(...))。

尝试按此模式拆分:([-a-z0-9]+\.[a-z]+){1,} 域名。

使用 GNU sed:

sed -r 's/ +(([-a-z0-9]+\.[a-z]){1,}) */\n/g' file

请注意,任何匹配 space 后跟 [-a-z0-9].[a-z] 字符的字符串都将作为域名处理。