文本处理——根据一个字符将第一列拆分为两列

Question

根据字符将文件中的第一列拆分为两列。括号()内的数据应移动到新的列中，删除括号。

给定的 csv 文件：

Col1(col2),col3,col4,col5
a(23),12,test(1),test2
b(30),15,test1(2),test3

预期文件：

Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

我尝试了下面的代码。我无法在括号之间提取数据，而且每次出现“()”时都需要它。

awk -F"(" '=' OFS="," filename

Answer 1

任你选：

$ sed 's/(\([^)]*\))/,/' file
Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

$ sed 's/(/,/; s/)//' file
Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

.

$ awk '{sub(/\(/,","); sub(/\)/,"")} 1' file
Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

$ awk 'match([=11=],/\([^)]*\)/){[=11=]= substr([=11=],1,RSTART-1) "," substr([=11=],RSTART+1,RLENGTH-2) substr([=11=],RSTART+RLENGTH) } 1' file
Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

$ awk 'BEGIN{FS=OFS=","} split(,a,/[()]/) > 1{=a[1] "," a[2]} 1' file
Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

$ gawk '{[=11=]=gensub(/\(([^)]*)\)/,",\1",1)} 1' file
Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

$ gawk 'match([=11=],/([^(]*)\(([^)]*)\)(.*)/,a){[=11=]=a[1] "," a[2] a[3]} 1' file
Col1,col2,col3,col4,col5
a,23,12,test(1),test2
b,30,15,test1(2),test3

最后两个分别需要 GNU awk 来实现 gensub() 和第三个参数来匹配()。还有其他选择。

Answer 2

为了完整性...

posix shell

while IFS= read -r line; do
    car=${line%%)*}
    caar=${car%%(*}
    cdar=${car##*(}
    cdr=${line#*)}
    printf '%s\n' "$caar,$cdar$cdr"
done < file

我不认为你可以单独使用 cut 来解决。

Answer 3

能否请您再尝试 1 个 sed 解决方案，这可能对您有所帮助。

sed 's/\([^(]*\)(\([^)]*\))\(.*\)/,/'  Input_file

文本处理——根据一个字符将第一列拆分为两列

Text processing - Split the first column into two columns based on a character

shell

awk

text-processing

cut

sed