sed:删除日志文件中的重复模式
sed: removing dublicated patterns in the log file
我正在使用 post- 处理按以下格式排列的日志文件:
Finding intramodel H-bonds
Constraints relaxed by 0.55 angstroms and 20 degrees
Models used:
1.1 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.6 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.10 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.8 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.2 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.3 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.4 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.7 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.5 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.9 SarsCov2_structure31R_nsp5holo_rep1.pdb
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/? ASN 142 ND2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/A UNL 1 N SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/? ASN 142 2HD2 3.419 2.541
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/? GLN 189 NE2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/A UNL 1 O SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/? GLN 189 1HE2 2.883 2.159
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.6/? HIS 163 NE2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.6/A UNL 1 O no hydrogen
从这个日志中,我需要取出第 3 行之后的所有行,然后删除所有重复的模式“SarsCov2_structure31R_nsp5holo_rep1.pdb”。我可以使用一些带有 sed 的正则表达式来检测日志中与这种模式匹配的任何短语(以 *.pdb 结尾)应该为每个处理过的日志自动删除吗?
所以预期的输出应该是:
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen 3.299 N/A
#1.7/? GLN 189 NE2 #1.7/A UNL 1 O #1.7/? GLN 189 1HE2 3.109 2.147
#1.9/? ASN 142 ND2 #1.9/A UNL 1 O #1.9/? ASN 142 1HD2 3.032 2.319
#1.10/? GLN 189 NE2 #1.10/A UNL 1 O #1.10/? GLN 189 1HE2 3.054 2.125
这里有一些没有正则表达式的例子,它还不能工作:-)
cat test.log | tail -n +2 | sed -e "/SarsCov2_structure31R_nsp5holo_rep1.pdb/d" >> ./test2.log
您可以使用这个 sed
:
sed -E '1,2d; s/[[:blank:]]*[^[:blank:]]+\.pdb//g' file
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen
详情:
1,2d
: 删除前两行
s/[[:blank:]]*[^[:blank:]]+\.pdb//g
:从全局的每一行中删除 0 个或多个 space 由 1+ 个非 space 字符后跟 .adb 填充的
使用sed
$ sed 's/[[:alnum:]_]*\.pdb//g;1,2d' input_file
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen
使用您显示的示例,请尝试遵循 awk
代码。简单的解释是,如果 FNR>2
首先检查条件,然后只检查 运行 所有其他命令(在条件块内)。使用 gsub
的内部条件根据显示的示例将 [[:space:]]*SarsCov2_structure31R_nsp5holo_rep1\.pdb
全局替换为 NULL 并打印当前行。
awk '
FNR>2{
gsub(/[[:space:]]*SarsCov2_structure31R_nsp5holo_rep1\.pdb/,"")
print
}
' Input_file
我正在使用 post- 处理按以下格式排列的日志文件:
Finding intramodel H-bonds
Constraints relaxed by 0.55 angstroms and 20 degrees
Models used:
1.1 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.6 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.10 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.8 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.2 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.3 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.4 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.7 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.5 SarsCov2_structure31R_nsp5holo_rep1.pdb
1.9 SarsCov2_structure31R_nsp5holo_rep1.pdb
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/? ASN 142 ND2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/A UNL 1 N SarsCov2_structure31R_nsp5holo_rep1.pdb #1.3/? ASN 142 2HD2 3.419 2.541
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/? GLN 189 NE2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/A UNL 1 O SarsCov2_structure31R_nsp5holo_rep1.pdb #1.5/? GLN 189 1HE2 2.883 2.159
SarsCov2_structure31R_nsp5holo_rep1.pdb #1.6/? HIS 163 NE2 SarsCov2_structure31R_nsp5holo_rep1.pdb #1.6/A UNL 1 O no hydrogen
从这个日志中,我需要取出第 3 行之后的所有行,然后删除所有重复的模式“SarsCov2_structure31R_nsp5holo_rep1.pdb”。我可以使用一些带有 sed 的正则表达式来检测日志中与这种模式匹配的任何短语(以 *.pdb 结尾)应该为每个处理过的日志自动删除吗? 所以预期的输出应该是:
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen 3.299 N/A
#1.7/? GLN 189 NE2 #1.7/A UNL 1 O #1.7/? GLN 189 1HE2 3.109 2.147
#1.9/? ASN 142 ND2 #1.9/A UNL 1 O #1.9/? ASN 142 1HD2 3.032 2.319
#1.10/? GLN 189 NE2 #1.10/A UNL 1 O #1.10/? GLN 189 1HE2 3.054 2.125
这里有一些没有正则表达式的例子,它还不能工作:-)
cat test.log | tail -n +2 | sed -e "/SarsCov2_structure31R_nsp5holo_rep1.pdb/d" >> ./test2.log
您可以使用这个 sed
:
sed -E '1,2d; s/[[:blank:]]*[^[:blank:]]+\.pdb//g' file
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen
详情:
1,2d
: 删除前两行s/[[:blank:]]*[^[:blank:]]+\.pdb//g
:从全局的每一行中删除 0 个或多个 space 由 1+ 个非 space 字符后跟 .adb 填充的
使用sed
$ sed 's/[[:alnum:]_]*\.pdb//g;1,2d' input_file
Models used:
1.1
1.6
1.10
1.8
1.2
1.3
1.4
1.7
1.5
1.9
6 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
#1.3/? ASN 142 ND2 #1.3/A UNL 1 N #1.3/? ASN 142 2HD2 3.419 2.541
#1.5/? GLN 189 NE2 #1.5/A UNL 1 O #1.5/? GLN 189 1HE2 2.883 2.159
#1.6/? HIS 163 NE2 #1.6/A UNL 1 O no hydrogen
使用您显示的示例,请尝试遵循 awk
代码。简单的解释是,如果 FNR>2
首先检查条件,然后只检查 运行 所有其他命令(在条件块内)。使用 gsub
的内部条件根据显示的示例将 [[:space:]]*SarsCov2_structure31R_nsp5holo_rep1\.pdb
全局替换为 NULL 并打印当前行。
awk '
FNR>2{
gsub(/[[:space:]]*SarsCov2_structure31R_nsp5holo_rep1\.pdb/,"")
print
}
' Input_file