使用 unix shell 在文本文件中的正则表达式前插入回车符 return

Question

我有一个杂乱的文本文件（大约 30 Ko），其中包含我必须使用 shell 脚本重新组织的数据。该文件展示了一个简单的模式： "parameter number"（介于 10001 和 10999 之间的值）后跟几个其他值（浮点数）。值由 space 分隔。我希望我的文件是：在每一行上，"parameter number" 后跟它的值（一行中只有一个 "parameter number"）。值由 space 分隔。

我的问题很容易理解：

"messy" 文件看起来像这样：

10001 x(1,1) x(1,2) ... x(1,n) 10002 x(2,1) x(2,2) ... x(2,n) 10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..] 10999 x(999,1) x(999,2) ... x(999,n)

其中 x(i,j) 是浮点数

我希望它是：

10001 x(1,1) x(1,2) ... x(1,n) 
10002 x(2,1) x(2,2) ... x(2,n) 
10003 x(3,1) x(3,2) ... x(3,n) 
...
10999 x(999,1) x(999,2) ... x(999,n)

我想编写一个 bash 脚本（或一个简单的命令），用回车替换模式 10[0-9][0-9][0-9]（正则表达式）之前的 "space" return.

Bash 脚本和正则表达式对我来说是新事物，无法找到简单的解决方案。

我正在考虑使用 bash ${string//substring/newsubstring} 参数扩展，但我仍然不知道如何在正则表达式中说 "the space that precedes the pattern 10[0-9][0-9][0-9]"。

Answer 1

would like to write a bash script (or a simple command) that replace the "space" before the pattern 10[0-9][0-9][0-9] (regex) by a carriage return.

你可以使用 sed。

sed 's/[[:space:]]\(10[0-9][0-9][0-9]\)/\n/g' file

或

sed 's/ \(10[0-9][0-9][0-9]\)/\n/g' file

在基本sed中，捕获组由$..$表示。

示例：

$ cat file
0001 x(1,1) x(1,2) ... x(1,n) 10002 x(2,1) x(2,2) ... x(2,n) 10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..] 10999 x(999,1) x(999,2) ... x(999,n) 
$ sed 's/[[:space:]]\(10[0-9][0-9][0-9]\)/\n/g' file
0001 x(1,1) x(1,2) ... x(1,n)
10002 x(2,1) x(2,2) ... x(2,n)
10003 x(3,1) x(3,2) ... x(3,n) [..and so on to..]
10999 x(999,1) x(999,2) ... x(999,n)

使用 unix shell 在文本文件中的正则表达式前插入回车符 return

insert carriage return before a regular expression in a text file using unix shell

regex

unix

bash

shell

carriage-return