awk next 和模式匹配
awk next and pattern match
如果我们有以下csv文件,我们只想得到"DELTA Energy Terns"部分的$9,不包括"Frame"
开头的行
Ligand Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,G gas,G solv,TOTAL
0,0.0,0.0,-37.2465,2.70257904,98.8916,0.0,-34.54392096,64.34767904
1,0.0,0.0,-33.1958,2.71419624,80.6403,0.0,-30.48160376,50.15869624
DELTA Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,DELTA G gas,DELTA G solv,DELTA TOTAL
0,-43.3713,0.0,44.4036,-5.24443392,-27.4605,-43.3713,39.15916608,-31.67263392
1,-43.7597,0.0,37.343,-5.1764544,-23.3471,-43.7597,32.1665456,-34.9402544
2,-42.5618,0.0,44.0748,-5.2738956,-26.6719,-42.5618,38.8009044,-30.4327956
3,-43.1034,0.0,41.3681,-5.25029544,-27.1501,-43.1034,36.11780456,-34.13569544
期望的输出:
-31.6726
-34.9402
-30.4327
-34.1356
以下尝试将打印出所有的$9,包括"Ligand Energy Terms"部分的$9。
awk -F, ' ~ /DELTA Energy Terms/ {next} ~ /Frame/ {next} {printf("%24.4f\n",)}'
awk -F, ' ~ /DELTA Energy Terms/ {next} {printf("%24.4f\n",)}'
有哪位大神能开导一下吗?
您可以尝试以下 awk 命令。
$ awk -v RS="\n\n" -v FS="\n" '/^DELTA Energy Terms/{for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}}' RS= file
-31.67263392
-34.9402544
-30.4327956
-34.13569544
RS="\n\n"
,所以记录分隔符设置了一个空行。
FS="\n"
,换行符设置为字段分隔符。
/^DELTA Energy Terms/
如果记录以 ^DELTA Energy Terms
开头,则对该特定记录执行以下操作。
{for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}}
遍历除1和2以外的所有字段,然后按照逗号拆分每个字段,然后将拆分的项目存储到名为a
. 的数组中
print a[9]
打印关联数组 a
. 中第 9 个索引处的元素
下面应该可以解决问题
awk -F, '/^DELTA/ {capture=1} /Energy Terms$/ {next} /^Frame/ {next} (capture) {print }'
我使用 capture
标志来控制是否应捕获单个记录。默认情况下 capture
为零。当解析 DELTA Energy Terms
行时,我开始捕获。我跳过任何以 Energy Terms
结尾或以 Frame
开头的行。否则,如果我们是"capturing",那我把第九个元素拿出来
如果您经常使用此脚本,我建议您使用类似以下的脚本:
#!/usr/bin/awk -f
BEGIN {
FS = ","
}
/^DELTA Energy Terms/ {
capture = 1;
next
}
/Energy Terms$/ {
capture = 0;
next
}
/^Frame/ { next }
(capture) { print }
将脚本保存为 extract-delta
并使其可执行,然后您就可以像使用任何其他 shell 命令一样使用它了:
$ cat input-file | tr -d '5' | ./extract-delta
-31.67263392
-34.9402544
-30.4327956
-34.13569544
您也可以通过 bash 完成此操作,方法如下:
tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt | cut -d":" -f1) )) input.txt | cut -d"," -f9
tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt
部分将从包含 DELTA Energy Terms 的行开始打印输入文件的行 加 2,然后 cut
将给出你是你正在寻找的第 9 个字段。
所有这些解决方案都有效,因此解决了眼前的问题,但 none 回答了隐含的问题。
查看有问题的命令,为什么它不起作用?
' ~ /DELTA Energy Terms/ {next} ~ /Frame/ {next} {printf("%24.4f\n",)}
让我们分解一下。
# Skip every line where the first field matches.
~ /DELTA Energy Terms/ {next}
# No line matches this criteria, so this has no effect.
# Explanation: The field separator isn't set, so defaults to breaking fields on white space.
# If you print out the first field, you will see "DELTA" on this line, not "DELTA Energy Terms".
# Skip every line where the first field matches "Frame".
~ /Frame/ {next}
# This matches and gets skipped.
# Print every line that didn't get skipped.
{printf("%24.4f\n",)}
# The two "Energy Terms" title lines don't have any entries in field 9,
# so it prints blanks for those lines.
如果我们有以下csv文件,我们只想得到"DELTA Energy Terns"部分的$9,不包括"Frame"
开头的行Ligand Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,G gas,G solv,TOTAL
0,0.0,0.0,-37.2465,2.70257904,98.8916,0.0,-34.54392096,64.34767904
1,0.0,0.0,-33.1958,2.71419624,80.6403,0.0,-30.48160376,50.15869624
DELTA Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,DELTA G gas,DELTA G solv,DELTA TOTAL
0,-43.3713,0.0,44.4036,-5.24443392,-27.4605,-43.3713,39.15916608,-31.67263392
1,-43.7597,0.0,37.343,-5.1764544,-23.3471,-43.7597,32.1665456,-34.9402544
2,-42.5618,0.0,44.0748,-5.2738956,-26.6719,-42.5618,38.8009044,-30.4327956
3,-43.1034,0.0,41.3681,-5.25029544,-27.1501,-43.1034,36.11780456,-34.13569544
期望的输出:
-31.6726
-34.9402
-30.4327
-34.1356
以下尝试将打印出所有的$9,包括"Ligand Energy Terms"部分的$9。
awk -F, ' ~ /DELTA Energy Terms/ {next} ~ /Frame/ {next} {printf("%24.4f\n",)}'
awk -F, ' ~ /DELTA Energy Terms/ {next} {printf("%24.4f\n",)}'
有哪位大神能开导一下吗?
您可以尝试以下 awk 命令。
$ awk -v RS="\n\n" -v FS="\n" '/^DELTA Energy Terms/{for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}}' RS= file
-31.67263392
-34.9402544
-30.4327956
-34.13569544
RS="\n\n"
,所以记录分隔符设置了一个空行。FS="\n"
,换行符设置为字段分隔符。/^DELTA Energy Terms/
如果记录以^DELTA Energy Terms
开头,则对该特定记录执行以下操作。{for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}}
遍历除1和2以外的所有字段,然后按照逗号拆分每个字段,然后将拆分的项目存储到名为a
. 的数组中
print a[9]
打印关联数组a
. 中第 9 个索引处的元素
下面应该可以解决问题
awk -F, '/^DELTA/ {capture=1} /Energy Terms$/ {next} /^Frame/ {next} (capture) {print }'
我使用 capture
标志来控制是否应捕获单个记录。默认情况下 capture
为零。当解析 DELTA Energy Terms
行时,我开始捕获。我跳过任何以 Energy Terms
结尾或以 Frame
开头的行。否则,如果我们是"capturing",那我把第九个元素拿出来
如果您经常使用此脚本,我建议您使用类似以下的脚本:
#!/usr/bin/awk -f
BEGIN {
FS = ","
}
/^DELTA Energy Terms/ {
capture = 1;
next
}
/Energy Terms$/ {
capture = 0;
next
}
/^Frame/ { next }
(capture) { print }
将脚本保存为 extract-delta
并使其可执行,然后您就可以像使用任何其他 shell 命令一样使用它了:
$ cat input-file | tr -d '5' | ./extract-delta
-31.67263392
-34.9402544
-30.4327956
-34.13569544
您也可以通过 bash 完成此操作,方法如下:
tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt | cut -d":" -f1) )) input.txt | cut -d"," -f9
tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt
部分将从包含 DELTA Energy Terms 的行开始打印输入文件的行 加 2,然后 cut
将给出你是你正在寻找的第 9 个字段。
所有这些解决方案都有效,因此解决了眼前的问题,但 none 回答了隐含的问题。
查看有问题的命令,为什么它不起作用?
' ~ /DELTA Energy Terms/ {next} ~ /Frame/ {next} {printf("%24.4f\n",)}
让我们分解一下。
# Skip every line where the first field matches.
~ /DELTA Energy Terms/ {next}
# No line matches this criteria, so this has no effect.
# Explanation: The field separator isn't set, so defaults to breaking fields on white space.
# If you print out the first field, you will see "DELTA" on this line, not "DELTA Energy Terms".
# Skip every line where the first field matches "Frame".
~ /Frame/ {next}
# This matches and gets skipped.
# Print every line that didn't get skipped.
{printf("%24.4f\n",)}
# The two "Energy Terms" title lines don't have any entries in field 9,
# so it prints blanks for those lines.