如果前两列匹配,则用另一个文件中的数据替换多列
Replace multiple columns with data from another file if the first two columns match
我需要将 ori_file2.pdb 中的第 1 列和第 2 列与 new_file1.pdb 中的第 1 列和第 2 列进行匹配。如果它们匹配,请将 new_file1.pdb 中的第 3、4、5 和 6 列替换为 ori_file2.pdb 中相同列中的数据,而不更改 new_file1.pdb.
中各列之间的空格
ori_file2.pdb
HELIX 1 1 PHE A 2 ALA A 7 1 6
ATOM 1 N PHE A 1 -3.631 -3.776 -2.910 1.00 0.00 N
ATOM 2 CA PHE A 1 -2.182 -3.776 -2.910 1.00 0.00 C
ATOM 3 C PHE A 1 -1.659 -2.347 -2.910 1.00 0.00 C
ATOM 4 O PHE A 1 -0.766 -2.011 -2.135 1.00 0.00 O
ATOM 5 CB PHE A 1 -1.630 -4.477 -4.142 1.00 0.00 C
ATOM 6 CG PHE A 1 -1.888 -5.964 -4.196 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -1.053 -6.844 -3.498 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -2.962 -6.461 -4.943 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -3.201 -7.840 -4.993 1.00 0.00 C
ATOM 10 CZ PHE A 1 -2.366 -8.721 -4.295 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -1.292 -8.223 -3.548 1.00 0.00 C
ATOM 12 N PHE A 2 -2.218 -1.506 -3.783 1.00 0.00 N
ATOM 13 CA PHE A 2 -1.808 -0.119 -3.881 1.00 0.00 C
ATOM 14 C PHE A 2 -1.962 0.568 -2.532 1.00 0.00 C
new_file1.pdb
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N LIG L 2 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 C LIG L 2 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C LIG L 2 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O LIG L 2 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 C LIG L 2 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 C LIG L 2 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 C LIG L 2 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 C LIG L 2 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 C LIG L 2 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 C LIG L 2 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 C LIG L 2 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N LIG L 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 C LIG L 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C LIG L 2 -26.824 -23.222 -2.975 1.00 0.00 C
我运行下面的代码
enter code here awk ' FNR==NR {
split([=12=],a,/[[:space:]]*/)
b[a[2]]=a[1]
next
}
{
n=split([=12=],d,/[^[:space:]]*/)
if(b[])
=b[]
for(i=1;i<=n;i++)
printf("%s%s",d[i],$i)
print ""
}' ori_file2.pdb new_file1.pdb
得到这个结果
ATOM 1 ATOM LIG L 2 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 ATOM LIG L 2 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 ATOM LIG L 2 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 ATOM LIG L 2 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 ATOM LIG L 2 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 ATOM LIG L 2 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 ATOM LIG L 2 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 ATOM LIG L 2 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 ATOM LIG L 2 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 ATOM LIG L 2 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 ATOM LIG L 2 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 ATOM LIG L 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 ATOM LIG L 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 ATOM LIG L 2 -26.824 -23.222 -2.975 1.00 0.00 C
然而,这是想要的结果
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C
我想保留 file2 的文件结构以供下游分析。
如果您不介意保留间距,那么:
One-liner:
awk 'FNR==NR{a[,]= FS FS FS ;next}((,) in a){split(a[,],t);=t[1];=t[2];=t[3];=t[4]}1' ori_file2.pub ori_file1.pub
和.. | column -t
[akshay@db1 tmp]$ awk 'FNR==NR{a[,]= FS FS FS ;next}((,) in a){split(a[,],t);=t[1];=t[2];=t[3];=t[4]}1' ori_file2.pub ori_file1.pub | column -t
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C
可读性更好:
awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
split(a[,],t);
=t[1]; =t[2]; =t[3]; =t[4]
}1
' ori_file2.pub ori_file1.pub
保留间距:
awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=split([=13=],arr,FS,d);
split(a[,],t);
=t[1];=t[2];=t[3];=t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
甚至
GNU awk(在 GNU Awk 4.2.1
上测试):
awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=patsplit([=14=], arr, FPAT, d);
split(a[,],t);
=t[1]; =t[2]; =t[3]; =t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
测试结果:
[akshay@db1 tmp]$ cat ori_file1.pub
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N LIG L 2 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 C LIG L 2 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C LIG L 2 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O LIG L 2 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 C LIG L 2 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 C LIG L 2 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 C LIG L 2 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 C LIG L 2 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 C LIG L 2 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 C LIG L 2 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 C LIG L 2 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N LIG L 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 C LIG L 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C LIG L 2 -26.824 -23.222 -2.975 1.00 0.00 C
[akshay@db1 tmp]$ cat ori_file2.pub
HELIX 1 1 PHE A 2 ALA A 7 1 6
ATOM 1 N PHE A 1 -3.631 -3.776 -2.910 1.00 0.00 N
ATOM 2 CA PHE A 1 -2.182 -3.776 -2.910 1.00 0.00 C
ATOM 3 C PHE A 1 -1.659 -2.347 -2.910 1.00 0.00 C
ATOM 4 O PHE A 1 -0.766 -2.011 -2.135 1.00 0.00 O
ATOM 5 CB PHE A 1 -1.630 -4.477 -4.142 1.00 0.00 C
ATOM 6 CG PHE A 1 -1.888 -5.964 -4.196 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -1.053 -6.844 -3.498 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -2.962 -6.461 -4.943 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -3.201 -7.840 -4.993 1.00 0.00 C
ATOM 10 CZ PHE A 1 -2.366 -8.721 -4.295 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -1.292 -8.223 -3.548 1.00 0.00 C
ATOM 12 N PHE A 2 -2.218 -1.506 -3.783 1.00 0.00 N
ATOM 13 CA PHE A 2 -1.808 -0.119 -3.881 1.00 0.00 C
ATOM 14 C PHE A 2 -1.962 0.568 -2.532 1.00 0.00 C
[akshay@db1 tmp]$ awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=split([=15=],arr,FS,d);
split(a[,],t);
=t[1];=t[2];=t[3];=t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C
[akshay@db1 tmp]$ awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=patsplit([=15=], arr, FPAT, d);
split(a[,],t);
=t[1]; =t[2]; =t[3]; =t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C
我需要将 ori_file2.pdb 中的第 1 列和第 2 列与 new_file1.pdb 中的第 1 列和第 2 列进行匹配。如果它们匹配,请将 new_file1.pdb 中的第 3、4、5 和 6 列替换为 ori_file2.pdb 中相同列中的数据,而不更改 new_file1.pdb.
中各列之间的空格ori_file2.pdb
HELIX 1 1 PHE A 2 ALA A 7 1 6
ATOM 1 N PHE A 1 -3.631 -3.776 -2.910 1.00 0.00 N
ATOM 2 CA PHE A 1 -2.182 -3.776 -2.910 1.00 0.00 C
ATOM 3 C PHE A 1 -1.659 -2.347 -2.910 1.00 0.00 C
ATOM 4 O PHE A 1 -0.766 -2.011 -2.135 1.00 0.00 O
ATOM 5 CB PHE A 1 -1.630 -4.477 -4.142 1.00 0.00 C
ATOM 6 CG PHE A 1 -1.888 -5.964 -4.196 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -1.053 -6.844 -3.498 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -2.962 -6.461 -4.943 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -3.201 -7.840 -4.993 1.00 0.00 C
ATOM 10 CZ PHE A 1 -2.366 -8.721 -4.295 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -1.292 -8.223 -3.548 1.00 0.00 C
ATOM 12 N PHE A 2 -2.218 -1.506 -3.783 1.00 0.00 N
ATOM 13 CA PHE A 2 -1.808 -0.119 -3.881 1.00 0.00 C
ATOM 14 C PHE A 2 -1.962 0.568 -2.532 1.00 0.00 C
new_file1.pdb
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N LIG L 2 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 C LIG L 2 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C LIG L 2 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O LIG L 2 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 C LIG L 2 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 C LIG L 2 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 C LIG L 2 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 C LIG L 2 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 C LIG L 2 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 C LIG L 2 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 C LIG L 2 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N LIG L 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 C LIG L 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C LIG L 2 -26.824 -23.222 -2.975 1.00 0.00 C
我运行下面的代码
enter code here awk ' FNR==NR {
split([=12=],a,/[[:space:]]*/)
b[a[2]]=a[1]
next
}
{
n=split([=12=],d,/[^[:space:]]*/)
if(b[])
=b[]
for(i=1;i<=n;i++)
printf("%s%s",d[i],$i)
print ""
}' ori_file2.pdb new_file1.pdb
得到这个结果
ATOM 1 ATOM LIG L 2 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 ATOM LIG L 2 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 ATOM LIG L 2 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 ATOM LIG L 2 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 ATOM LIG L 2 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 ATOM LIG L 2 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 ATOM LIG L 2 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 ATOM LIG L 2 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 ATOM LIG L 2 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 ATOM LIG L 2 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 ATOM LIG L 2 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 ATOM LIG L 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 ATOM LIG L 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 ATOM LIG L 2 -26.824 -23.222 -2.975 1.00 0.00 C
然而,这是想要的结果
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C
我想保留 file2 的文件结构以供下游分析。
如果您不介意保留间距,那么:
One-liner:
awk 'FNR==NR{a[,]= FS FS FS ;next}((,) in a){split(a[,],t);=t[1];=t[2];=t[3];=t[4]}1' ori_file2.pub ori_file1.pub
和.. | column -t
[akshay@db1 tmp]$ awk 'FNR==NR{a[,]= FS FS FS ;next}((,) in a){split(a[,],t);=t[1];=t[2];=t[3];=t[4]}1' ori_file2.pub ori_file1.pub | column -t
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C
可读性更好:
awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
split(a[,],t);
=t[1]; =t[2]; =t[3]; =t[4]
}1
' ori_file2.pub ori_file1.pub
保留间距:
awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=split([=13=],arr,FS,d);
split(a[,],t);
=t[1];=t[2];=t[3];=t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
甚至
GNU awk(在 GNU Awk 4.2.1
上测试):
awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=patsplit([=14=], arr, FPAT, d);
split(a[,],t);
=t[1]; =t[2]; =t[3]; =t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
测试结果:
[akshay@db1 tmp]$ cat ori_file1.pub
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N LIG L 2 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 C LIG L 2 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C LIG L 2 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O LIG L 2 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 C LIG L 2 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 C LIG L 2 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 C LIG L 2 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 C LIG L 2 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 C LIG L 2 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 C LIG L 2 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 C LIG L 2 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N LIG L 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 C LIG L 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C LIG L 2 -26.824 -23.222 -2.975 1.00 0.00 C
[akshay@db1 tmp]$ cat ori_file2.pub
HELIX 1 1 PHE A 2 ALA A 7 1 6
ATOM 1 N PHE A 1 -3.631 -3.776 -2.910 1.00 0.00 N
ATOM 2 CA PHE A 1 -2.182 -3.776 -2.910 1.00 0.00 C
ATOM 3 C PHE A 1 -1.659 -2.347 -2.910 1.00 0.00 C
ATOM 4 O PHE A 1 -0.766 -2.011 -2.135 1.00 0.00 O
ATOM 5 CB PHE A 1 -1.630 -4.477 -4.142 1.00 0.00 C
ATOM 6 CG PHE A 1 -1.888 -5.964 -4.196 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -1.053 -6.844 -3.498 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -2.962 -6.461 -4.943 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -3.201 -7.840 -4.993 1.00 0.00 C
ATOM 10 CZ PHE A 1 -2.366 -8.721 -4.295 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -1.292 -8.223 -3.548 1.00 0.00 C
ATOM 12 N PHE A 2 -2.218 -1.506 -3.783 1.00 0.00 N
ATOM 13 CA PHE A 2 -1.808 -0.119 -3.881 1.00 0.00 C
ATOM 14 C PHE A 2 -1.962 0.568 -2.532 1.00 0.00 C
[akshay@db1 tmp]$ awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=split([=15=],arr,FS,d);
split(a[,],t);
=t[1];=t[2];=t[3];=t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C
[akshay@db1 tmp]$ awk 'FNR==NR{
a[,]= FS FS FS ;
next
}
((,) in a){
n=patsplit([=15=], arr, FPAT, d);
split(a[,],t);
=t[1]; =t[2]; =t[3]; =t[4];
for(i=1;i<=n;i++)
printf "%s%s", $(i),(i<n? d[i] : ORS);
next
}1
' ori_file2.pub ori_file1.pub
MODEL 1
COMPND UNNAMED
AUTHOR GENERATED BY OPEN BABEL 2.3.90
ATOM 1 N PHE A 1 -28.497 -21.375 1.835 1.00 0.00 N
ATOM 2 CA PHE A 1 -27.282 -21.191 1.068 1.00 0.00 C
ATOM 3 C PHE A 1 -27.048 -22.391 0.162 1.00 0.00 C
ATOM 4 O PHE A 1 -26.148 -23.191 0.408 1.00 0.00 O
ATOM 5 CB PHE A 1 -26.071 -21.047 1.977 1.00 0.00 C
ATOM 6 CG PHE A 1 -26.119 -19.866 2.917 1.00 0.00 C
ATOM 7 CD2 PHE A 1 -26.393 -20.064 4.275 1.00 0.00 C
ATOM 8 CD1 PHE A 1 -25.887 -18.575 2.430 1.00 0.00 C
ATOM 9 CE1 PHE A 1 -25.932 -17.479 3.301 1.00 0.00 C
ATOM 10 CZ PHE A 1 -26.206 -17.677 4.660 1.00 0.00 C
ATOM 11 CE2 PHE A 1 -26.438 -18.969 5.147 1.00 0.00 C
ATOM 12 N PHE A 2 -27.862 -22.514 -0.889 1.00 0.00 N
ATOM 13 CA PHE A 2 -27.742 -23.613 -1.826 1.00 0.00 C
ATOM 14 C PHE A 2 -26.824 -23.222 -2.975 1.00 0.00 C