如何删除 bash 中的多个匹配项
How to delete multiple matches in bash
我有多个具有以下结构的文件:
(Genome1_Sample4A_protein_Genome1_Sample4A_132_2:0.0060449,(Genome1_Sample5A_protein_Genome1_Sample5A_30_12:1e-06,(Genome1_Sample1B_protein_Genome1_Sample1B_99_2:1e-06,Genome1_Sample6A_protein_Genome1_Sample6A_295_2:0.00366292)n2:0.00370314)n1:0.0060449)n0;
我想删除每一个中“_protein”和“:”之间的内容。所以输出如下:
(Genome1_Sample4A:0.0060449,(Genome1_Sample5A:1e-06,(Genome1_Sample1B:1e-06,Genome1_Sample6A:0.00366292)n2:0.00370314)n1:0.0060449)n0;
我尝试过使用 sed 和 awk:
sed -i 's/_protein.*:/:/g' tree1.txt
sed -i 's/_protein.*_[[:digit:]]*:/:/g' tree1.txt
awk '{gsub(/\_protein*:/,":");}1' tree1.txt
但是这些代码中的任何一个都给了我想要的输出。
.*
是 greedy,所以用这个代替:
sed 's/_protein[^:]*:/:/g' tree1.txt
输出:
(Genome1_Sample4A:0.0060449,(Genome1_Sample5A:1e-06,(Genome1_Sample1B:1e-06,Genome1_Sample6A:0.00366292)n2:0.00370314)n1:0.0060449)n0;
我有多个具有以下结构的文件:
(Genome1_Sample4A_protein_Genome1_Sample4A_132_2:0.0060449,(Genome1_Sample5A_protein_Genome1_Sample5A_30_12:1e-06,(Genome1_Sample1B_protein_Genome1_Sample1B_99_2:1e-06,Genome1_Sample6A_protein_Genome1_Sample6A_295_2:0.00366292)n2:0.00370314)n1:0.0060449)n0;
我想删除每一个中“_protein”和“:”之间的内容。所以输出如下:
(Genome1_Sample4A:0.0060449,(Genome1_Sample5A:1e-06,(Genome1_Sample1B:1e-06,Genome1_Sample6A:0.00366292)n2:0.00370314)n1:0.0060449)n0;
我尝试过使用 sed 和 awk:
sed -i 's/_protein.*:/:/g' tree1.txt
sed -i 's/_protein.*_[[:digit:]]*:/:/g' tree1.txt
awk '{gsub(/\_protein*:/,":");}1' tree1.txt
但是这些代码中的任何一个都给了我想要的输出。
.*
是 greedy,所以用这个代替:
sed 's/_protein[^:]*:/:/g' tree1.txt
输出:
(Genome1_Sample4A:0.0060449,(Genome1_Sample5A:1e-06,(Genome1_Sample1B:1e-06,Genome1_Sample6A:0.00366292)n2:0.00370314)n1:0.0060449)n0;