使用 awk/sed 更改列
Using awk/sed to change columns
这是我的文本文件:
@<TRIPOS>MOLECULE
*****
22 22 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C4 24.2940 -24.1240 -0.0710 C.3 167 JZ4167 -0.0650
2 C7 21.5530 -27.2140 -4.1120 C.ar 167 JZ4167 -0.0613
3 C8 22.0680 -26.7470 -5.3310 C.ar 167 JZ4167 -0.0583
4 C9 22.6710 -25.5120 -5.4480 C.ar 167 JZ4167 -0.0199
5 C10 22.7690 -24.7300 -4.2950 C.ar 167 JZ4167 0.1200
6 C11 21.6930 -26.4590 -2.9540 C.ar 167 JZ4167 -0.0551
7 C12 22.2940 -25.1870 -3.0750 C.ar 167 JZ4167 -0.0060
8 C13 22.4630 -24.4140 -1.8080 C.3 167 JZ4167 -0.0245
9 C14 23.9250 -24.7040 -1.3940 C.3 167 JZ4167 -0.0518
10 OAB 23.4120 -23.5360 -4.3420 O.3 167 JZ4167 -0.5065
11 H 25.3133 -24.3619 0.1509 H 1 UNL1 0.0230
12 H 23.6591 -24.5327 0.6872 H 1 UNL1 0.0230
13 H 24.1744 -23.0611 -0.1016 H 1 UNL1 0.0230
14 H 21.0673 -28.1238 -4.0754 H 1 UNL1 0.0618
15 H 21.9931 -27.3472 -6.1672 H 1 UNL1 0.0619
16 H 23.0361 -25.1783 -6.3537 H 1 UNL1 0.0654
17 H 21.3701 -26.8143 -2.0405 H 1 UNL1 0.0621
18 H 21.7794 -24.7551 -1.0588 H 1 UNL1 0.0314
19 H 22.2659 -23.3694 -1.9301 H 1 UNL1 0.0314
20 H 24.5755 -24.2929 -2.1375 H 1 UNL1 0.0266
21 H 24.0241 -25.7662 -1.3110 H 1 UNL1 0.0266
22 H 23.7394 -23.2120 -5.1580 H 1 UNL1 0.2921
@<TRIPOS>BOND
1 4 3 ar
2 4 5 ar
3 3 2 ar
4 10 5 1
5 5 7 ar
6 2 6 ar
7 7 6 ar
8 7 8 1
9 8 9 1
10 9 1 1
11 1 11 1
12 1 12 1
13 1 13 1
14 2 14 1
15 3 15 1
16 4 16 1
17 6 17 1
18 8 18 1
19 8 19 1
20 9 20 1
21 9 21 1
22 10 22 1
我想做的是把倒数第二列换成JZ4167和UNL1,完全是JZ4,倒数第三列全1
所以我的预期输出是:
@<TRIPOS>MOLECULE
*****
22 22 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C4 24.2940 -24.1240 -0.0710 C.3 1 JZ4 -0.0650
2 C7 21.5530 -27.2140 -4.1120 C.ar 1 JZ4 -0.0613
3 C8 22.0680 -26.7470 -5.3310 C.ar 1 JZ4 -0.0583
4 C9 22.6710 -25.5120 -5.4480 C.ar 1 JZ4 -0.0199
5 C10 22.7690 -24.7300 -4.2950 C.ar 1 JZ4 0.1200
6 C11 21.6930 -26.4590 -2.9540 C.ar 1 JZ4 -0.0551
7 C12 22.2940 -25.1870 -3.0750 C.ar 1 JZ4 -0.0060
8 C13 22.4630 -24.4140 -1.8080 C.3 1 JZ4 -0.0245
9 C14 23.9250 -24.7040 -1.3940 C.3 1 JZ4 -0.0518
10 OAB 23.4120 -23.5360 -4.3420 O.3 1 JZ4 -0.5065
11 H 25.3133 -24.3619 0.1509 H 1 JZ4 0.0230
12 H 23.6591 -24.5327 0.6872 H 1 JZ4 0.0230
13 H 24.1744 -23.0611 -0.1016 H 1 JZ4 0.0230
14 H 21.0673 -28.1238 -4.0754 H 1 JZ4 0.0618
15 H 21.9931 -27.3472 -6.1672 H 1 JZ4 0.0619
16 H 23.0361 -25.1783 -6.3537 H 1 JZ4 0.0654
17 H 21.3701 -26.8143 -2.0405 H 1 JZ4 0.0621
18 H 21.7794 -24.7551 -1.0588 H 1 JZ4 0.0314
19 H 22.2659 -23.3694 -1.9301 H 1 JZ4 0.0314
20 H 24.5755 -24.2929 -2.1375 H 1 JZ4 0.0266
21 H 24.0241 -25.7662 -1.3110 H 1 JZ4 0.0266
22 H 23.7394 -23.2120 -5.1580 H 1 JZ4 0.2921
@<TRIPOS>BOND
1 4 3 ar
2 4 5 ar
3 3 2 ar
4 10 5 1
5 5 7 ar
6 2 6 ar
7 7 6 ar
8 7 8 1
9 8 9 1
10 9 1 1
11 1 11 1
12 1 12 1
13 1 13 1
14 2 14 1
15 3 15 1
16 4 16 1
17 6 17 1
18 8 18 1
19 8 19 1
20 9 20 1
21 9 21 1
22 10 22 1
我一直在使用 sed 将 JZ4167 替换为 JZ4,将 UNL1 替换为 JZ4,使用
sed 's/JZ4167/JZ4/g' myfile
和 sed 's/UNL1/JZ4/g' myfile
,
但我不能安全地执行 sed 's/167/1/g' myfile
,因为我的坐标中可能有一个 167
,我不想弄乱我的坐标。我想知道是否有办法用 awk 或类似的东西来做到这一点。
如有任何建议,我们将不胜感激。
假设您想要保留列宽,这可能就是您想要的:
$ cat tst.awk
/^@/ {
inBlock=( == "@<TRIPOS>ATOM" ? 1 : 0 )
print
next
}
inBlock {
[=10=] = substr([=10=],1,53) sprintf("%3s %-10s %7s",1,"JZ4",$NF)
}
{ print }
$ awk -f tst.awk file
@<TRIPOS>MOLECULE
*****
22 22 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C4 24.2940 -24.1240 -0.0710 C.3 1 JZ4 -0.0650
2 C7 21.5530 -27.2140 -4.1120 C.ar 1 JZ4 -0.0613
3 C8 22.0680 -26.7470 -5.3310 C.ar 1 JZ4 -0.0583
4 C9 22.6710 -25.5120 -5.4480 C.ar 1 JZ4 -0.0199
5 C10 22.7690 -24.7300 -4.2950 C.ar 1 JZ4 0.1200
6 C11 21.6930 -26.4590 -2.9540 C.ar 1 JZ4 -0.0551
7 C12 22.2940 -25.1870 -3.0750 C.ar 1 JZ4 -0.0060
8 C13 22.4630 -24.4140 -1.8080 C.3 1 JZ4 -0.0245
9 C14 23.9250 -24.7040 -1.3940 C.3 1 JZ4 -0.0518
10 OAB 23.4120 -23.5360 -4.3420 O.3 1 JZ4 -0.5065
11 H 25.3133 -24.3619 0.1509 H 1 JZ4 0.0230
12 H 23.6591 -24.5327 0.6872 H 1 JZ4 0.0230
13 H 24.1744 -23.0611 -0.1016 H 1 JZ4 0.0230
14 H 21.0673 -28.1238 -4.0754 H 1 JZ4 0.0618
15 H 21.9931 -27.3472 -6.1672 H 1 JZ4 0.0619
16 H 23.0361 -25.1783 -6.3537 H 1 JZ4 0.0654
17 H 21.3701 -26.8143 -2.0405 H 1 JZ4 0.0621
18 H 21.7794 -24.7551 -1.0588 H 1 JZ4 0.0314
19 H 22.2659 -23.3694 -1.9301 H 1 JZ4 0.0314
20 H 24.5755 -24.2929 -2.1375 H 1 JZ4 0.0266
21 H 24.0241 -25.7662 -1.3110 H 1 JZ4 0.0266
22 H 23.7394 -23.2120 -5.1580 H 1 JZ4 0.2921
@<TRIPOS>BOND
1 4 3 ar
2 4 5 ar
3 3 2 ar
4 10 5 1
5 5 7 ar
6 2 6 ar
7 7 6 ar
8 7 8 1
9 8 9 1
10 9 1 1
11 1 11 1
12 1 12 1
13 1 13 1
14 2 14 1
15 3 15 1
16 4 16 1
17 6 17 1
18 8 18 1
19 8 19 1
20 9 20 1
21 9 21 1
22 10 22 1
一个sed
一行:
sed '/^@<TRIPOS>ATOM$/,/^@/{/^@/!s/.\{16\}\(.\{7\}\)$/ 1 JZ4 /;}' file
它对一个地址范围进行操作,从@<TRIPOS>ATOM
组成的行到以@
开头的行。由于感兴趣的行是固定宽度的,它会修改最后 23 个字符,保留最后 7 个字符,而不管行的内容如何。
这是我的文本文件:
@<TRIPOS>MOLECULE
*****
22 22 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C4 24.2940 -24.1240 -0.0710 C.3 167 JZ4167 -0.0650
2 C7 21.5530 -27.2140 -4.1120 C.ar 167 JZ4167 -0.0613
3 C8 22.0680 -26.7470 -5.3310 C.ar 167 JZ4167 -0.0583
4 C9 22.6710 -25.5120 -5.4480 C.ar 167 JZ4167 -0.0199
5 C10 22.7690 -24.7300 -4.2950 C.ar 167 JZ4167 0.1200
6 C11 21.6930 -26.4590 -2.9540 C.ar 167 JZ4167 -0.0551
7 C12 22.2940 -25.1870 -3.0750 C.ar 167 JZ4167 -0.0060
8 C13 22.4630 -24.4140 -1.8080 C.3 167 JZ4167 -0.0245
9 C14 23.9250 -24.7040 -1.3940 C.3 167 JZ4167 -0.0518
10 OAB 23.4120 -23.5360 -4.3420 O.3 167 JZ4167 -0.5065
11 H 25.3133 -24.3619 0.1509 H 1 UNL1 0.0230
12 H 23.6591 -24.5327 0.6872 H 1 UNL1 0.0230
13 H 24.1744 -23.0611 -0.1016 H 1 UNL1 0.0230
14 H 21.0673 -28.1238 -4.0754 H 1 UNL1 0.0618
15 H 21.9931 -27.3472 -6.1672 H 1 UNL1 0.0619
16 H 23.0361 -25.1783 -6.3537 H 1 UNL1 0.0654
17 H 21.3701 -26.8143 -2.0405 H 1 UNL1 0.0621
18 H 21.7794 -24.7551 -1.0588 H 1 UNL1 0.0314
19 H 22.2659 -23.3694 -1.9301 H 1 UNL1 0.0314
20 H 24.5755 -24.2929 -2.1375 H 1 UNL1 0.0266
21 H 24.0241 -25.7662 -1.3110 H 1 UNL1 0.0266
22 H 23.7394 -23.2120 -5.1580 H 1 UNL1 0.2921
@<TRIPOS>BOND
1 4 3 ar
2 4 5 ar
3 3 2 ar
4 10 5 1
5 5 7 ar
6 2 6 ar
7 7 6 ar
8 7 8 1
9 8 9 1
10 9 1 1
11 1 11 1
12 1 12 1
13 1 13 1
14 2 14 1
15 3 15 1
16 4 16 1
17 6 17 1
18 8 18 1
19 8 19 1
20 9 20 1
21 9 21 1
22 10 22 1
我想做的是把倒数第二列换成JZ4167和UNL1,完全是JZ4,倒数第三列全1
所以我的预期输出是:
@<TRIPOS>MOLECULE
*****
22 22 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C4 24.2940 -24.1240 -0.0710 C.3 1 JZ4 -0.0650
2 C7 21.5530 -27.2140 -4.1120 C.ar 1 JZ4 -0.0613
3 C8 22.0680 -26.7470 -5.3310 C.ar 1 JZ4 -0.0583
4 C9 22.6710 -25.5120 -5.4480 C.ar 1 JZ4 -0.0199
5 C10 22.7690 -24.7300 -4.2950 C.ar 1 JZ4 0.1200
6 C11 21.6930 -26.4590 -2.9540 C.ar 1 JZ4 -0.0551
7 C12 22.2940 -25.1870 -3.0750 C.ar 1 JZ4 -0.0060
8 C13 22.4630 -24.4140 -1.8080 C.3 1 JZ4 -0.0245
9 C14 23.9250 -24.7040 -1.3940 C.3 1 JZ4 -0.0518
10 OAB 23.4120 -23.5360 -4.3420 O.3 1 JZ4 -0.5065
11 H 25.3133 -24.3619 0.1509 H 1 JZ4 0.0230
12 H 23.6591 -24.5327 0.6872 H 1 JZ4 0.0230
13 H 24.1744 -23.0611 -0.1016 H 1 JZ4 0.0230
14 H 21.0673 -28.1238 -4.0754 H 1 JZ4 0.0618
15 H 21.9931 -27.3472 -6.1672 H 1 JZ4 0.0619
16 H 23.0361 -25.1783 -6.3537 H 1 JZ4 0.0654
17 H 21.3701 -26.8143 -2.0405 H 1 JZ4 0.0621
18 H 21.7794 -24.7551 -1.0588 H 1 JZ4 0.0314
19 H 22.2659 -23.3694 -1.9301 H 1 JZ4 0.0314
20 H 24.5755 -24.2929 -2.1375 H 1 JZ4 0.0266
21 H 24.0241 -25.7662 -1.3110 H 1 JZ4 0.0266
22 H 23.7394 -23.2120 -5.1580 H 1 JZ4 0.2921
@<TRIPOS>BOND
1 4 3 ar
2 4 5 ar
3 3 2 ar
4 10 5 1
5 5 7 ar
6 2 6 ar
7 7 6 ar
8 7 8 1
9 8 9 1
10 9 1 1
11 1 11 1
12 1 12 1
13 1 13 1
14 2 14 1
15 3 15 1
16 4 16 1
17 6 17 1
18 8 18 1
19 8 19 1
20 9 20 1
21 9 21 1
22 10 22 1
我一直在使用 sed 将 JZ4167 替换为 JZ4,将 UNL1 替换为 JZ4,使用
sed 's/JZ4167/JZ4/g' myfile
和 sed 's/UNL1/JZ4/g' myfile
,
但我不能安全地执行 sed 's/167/1/g' myfile
,因为我的坐标中可能有一个 167
,我不想弄乱我的坐标。我想知道是否有办法用 awk 或类似的东西来做到这一点。
如有任何建议,我们将不胜感激。
假设您想要保留列宽,这可能就是您想要的:
$ cat tst.awk
/^@/ {
inBlock=( == "@<TRIPOS>ATOM" ? 1 : 0 )
print
next
}
inBlock {
[=10=] = substr([=10=],1,53) sprintf("%3s %-10s %7s",1,"JZ4",$NF)
}
{ print }
$ awk -f tst.awk file
@<TRIPOS>MOLECULE
*****
22 22 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C4 24.2940 -24.1240 -0.0710 C.3 1 JZ4 -0.0650
2 C7 21.5530 -27.2140 -4.1120 C.ar 1 JZ4 -0.0613
3 C8 22.0680 -26.7470 -5.3310 C.ar 1 JZ4 -0.0583
4 C9 22.6710 -25.5120 -5.4480 C.ar 1 JZ4 -0.0199
5 C10 22.7690 -24.7300 -4.2950 C.ar 1 JZ4 0.1200
6 C11 21.6930 -26.4590 -2.9540 C.ar 1 JZ4 -0.0551
7 C12 22.2940 -25.1870 -3.0750 C.ar 1 JZ4 -0.0060
8 C13 22.4630 -24.4140 -1.8080 C.3 1 JZ4 -0.0245
9 C14 23.9250 -24.7040 -1.3940 C.3 1 JZ4 -0.0518
10 OAB 23.4120 -23.5360 -4.3420 O.3 1 JZ4 -0.5065
11 H 25.3133 -24.3619 0.1509 H 1 JZ4 0.0230
12 H 23.6591 -24.5327 0.6872 H 1 JZ4 0.0230
13 H 24.1744 -23.0611 -0.1016 H 1 JZ4 0.0230
14 H 21.0673 -28.1238 -4.0754 H 1 JZ4 0.0618
15 H 21.9931 -27.3472 -6.1672 H 1 JZ4 0.0619
16 H 23.0361 -25.1783 -6.3537 H 1 JZ4 0.0654
17 H 21.3701 -26.8143 -2.0405 H 1 JZ4 0.0621
18 H 21.7794 -24.7551 -1.0588 H 1 JZ4 0.0314
19 H 22.2659 -23.3694 -1.9301 H 1 JZ4 0.0314
20 H 24.5755 -24.2929 -2.1375 H 1 JZ4 0.0266
21 H 24.0241 -25.7662 -1.3110 H 1 JZ4 0.0266
22 H 23.7394 -23.2120 -5.1580 H 1 JZ4 0.2921
@<TRIPOS>BOND
1 4 3 ar
2 4 5 ar
3 3 2 ar
4 10 5 1
5 5 7 ar
6 2 6 ar
7 7 6 ar
8 7 8 1
9 8 9 1
10 9 1 1
11 1 11 1
12 1 12 1
13 1 13 1
14 2 14 1
15 3 15 1
16 4 16 1
17 6 17 1
18 8 18 1
19 8 19 1
20 9 20 1
21 9 21 1
22 10 22 1
一个sed
一行:
sed '/^@<TRIPOS>ATOM$/,/^@/{/^@/!s/.\{16\}\(.\{7\}\)$/ 1 JZ4 /;}' file
它对一个地址范围进行操作,从@<TRIPOS>ATOM
组成的行到以@
开头的行。由于感兴趣的行是固定宽度的,它会修改最后 23 个字符,保留最后 7 个字符,而不管行的内容如何。