将包含多个单词的单行拆分为多行,每行包含 x 个单词
Split single line with multiple words into many lines with x words on each
我有一个很大的文本文件,其中只有 1 行。它看起来像这样:
blaalibababla.ru text text text text what's the weather like tooday? blaazzabla.zu some_text blabewdwefla.au it is important not to be afraid of sed blabkrlqbla.ru wjenfkn lkwnef lkwnefl blarthrthbla.net 1234 e12edq 42wsdfg blablabla.com this should finally end
我需要一种方法让它看起来像这样:
blaalibababla.ru text text text text what's the weather like tooday?
blaazzabla.zu some_text
blabewdwefla.au it is important not to be afraid of sed
blabkrlqbla.ru wjenfkn lkwnef lkwnefl
blarthrthbla.net 1234 e12edq 42wsdfg
blablabla.com this should finally end
我知道如何使用单个域名和 sed
:
sed -i 's/blablabla.ru/\n&/g' file.txt
"But not with the additional text afterwards." - 这不是我的意思。
如果sed
不是最好的方法,请告诉我。
更新:
这是我的文本文件:
wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32 tech-lor notify-by-telegram CRITICAL - load average: 42.91, 49.91, 53.88 glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes
我需要它看起来像:
wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes
zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32 tech-lor notify-by-telegram CRITICAL - load average: 42.91, 49.91, 53.88
glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes
使用 xargs
一次处理 n
条记录的更简单方法,在您的情况下只是 2
xargs -n2 <file
blablabla.ru some_text
blablabla.zu some_text
blablabla.au some_text
blablabla.ru some_text
blablabla.net some_text
blablabla.com some_text
根据 man xargs
页面的 -n
标志所在的位置,
-n max-args, --max-args=max-args
Use at most max-args arguments per command line. Fewer than max-args arguments
will be used if the size (see the -s option) is exceeded, unless the
-x option is given, in which case xargs will exit.
要替换回原始文件,请执行
xargs -n2 <file >tmpfile; mv tmpfile file
Awk:
$ awk 'gsub(/([^ ]+ ){2}/,"&\n")' file
blablabla.ru some_text
blablabla.zu some_text
blablabla.au some_text
blablabla.ru some_text
blablabla.net some_text
blablabla.com some_text
解释:
每两次重复 [^ ]+
(非 space 和 space 的字符串)替换为自身 (&
) 和换行符 \n
。如果最后有剩余(即 non-match),它不会被打印(除非你用 {}1
包裹 gsub(...)
)。
尝试按此模式拆分:([-a-z0-9]+\.[a-z]+){1,}
域名。
使用 GNU sed:
sed -r 's/ +(([-a-z0-9]+\.[a-z]){1,}) */\n/g' file
请注意,任何匹配 space 后跟 [-a-z0-9]
、.
和 [a-z]
字符的字符串都将作为域名处理。
我有一个很大的文本文件,其中只有 1 行。它看起来像这样:
blaalibababla.ru text text text text what's the weather like tooday? blaazzabla.zu some_text blabewdwefla.au it is important not to be afraid of sed blabkrlqbla.ru wjenfkn lkwnef lkwnefl blarthrthbla.net 1234 e12edq 42wsdfg blablabla.com this should finally end
我需要一种方法让它看起来像这样:
blaalibababla.ru text text text text what's the weather like tooday?
blaazzabla.zu some_text
blabewdwefla.au it is important not to be afraid of sed
blabkrlqbla.ru wjenfkn lkwnef lkwnefl
blarthrthbla.net 1234 e12edq 42wsdfg
blablabla.com this should finally end
我知道如何使用单个域名和 sed
:
sed -i 's/blablabla.ru/\n&/g' file.txt
"But not with the additional text afterwards." - 这不是我的意思。
如果sed
不是最好的方法,请告诉我。
更新: 这是我的文本文件:
wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32 tech-lor notify-by-telegram CRITICAL - load average: 42.91, 49.91, 53.88 glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes
我需要它看起来像:
wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes
zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32 tech-lor notify-by-telegram CRITICAL - load average: 42.91, 49.91, 53.88
glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02 lor notify-by-sms FILE_AGE CRITICAL: /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes
使用 xargs
一次处理 n
条记录的更简单方法,在您的情况下只是 2
xargs -n2 <file
blablabla.ru some_text
blablabla.zu some_text
blablabla.au some_text
blablabla.ru some_text
blablabla.net some_text
blablabla.com some_text
根据 man xargs
页面的 -n
标志所在的位置,
-n max-args, --max-args=max-args
Use at most max-args arguments per command line. Fewer than max-args arguments
will be used if the size (see the -s option) is exceeded, unless the
-x option is given, in which case xargs will exit.
要替换回原始文件,请执行
xargs -n2 <file >tmpfile; mv tmpfile file
Awk:
$ awk 'gsub(/([^ ]+ ){2}/,"&\n")' file
blablabla.ru some_text
blablabla.zu some_text
blablabla.au some_text
blablabla.ru some_text
blablabla.net some_text
blablabla.com some_text
解释:
每两次重复 [^ ]+
(非 space 和 space 的字符串)替换为自身 (&
) 和换行符 \n
。如果最后有剩余(即 non-match),它不会被打印(除非你用 {}1
包裹 gsub(...)
)。
尝试按此模式拆分:([-a-z0-9]+\.[a-z]+){1,}
域名。
使用 GNU sed:
sed -r 's/ +(([-a-z0-9]+\.[a-z]){1,}) */\n/g' file
请注意,任何匹配 space 后跟 [-a-z0-9]
、.
和 [a-z]
字符的字符串都将作为域名处理。