Bash 如何过滤输入文件
Bash how to filter input file
我的文件很大,里面有很多不需要的东西,需要过滤掉。
所以我感兴趣的事情是这样的:
iteration # 20 ecut= 36.00 Ry beta=0.10
Davidson diagonalization with overlap
ethr = 4.07E-13, avg # of iterations = 3.8
total cpu time spent up to now is 351441.3 secs
End of self-consistent calculation
Number of k-points >= 100: set verbosity='high' to print the bands.
highest occupied, lowest unoccupied level (ev): 2.2896 4.1062
好吧,我需要计算:Bg=ELUMO−EHOMO,而ELUMO和EHOMO是最高和最低占用值。
问题是我想要这样的输出:
Iteration #<number>
Bg=xxx
我的2个问题:
1.
我可以通过 'highest' - 字符串进行 grep,这样我得到的每一行都像:
highest occupied, lowest unoccupied level (ev): 2.3005 4.0791
但是如何将变量设置为最高和最低未占用级别?
2.Since 并非每次迭代都会给我未占用级别的值(我想跳过它),我应该如何 grep/find 始终拥有迭代编号和未占用级别?
使用 awk
awk '/^iteration|highest/{if ([=10=] ~ "iteration") gsub(/ecut.*/ , "", [=10=]); if ([=10=] ~ "highest") [=10=]=($(NF-1)-$NF); print}'
演示:
$cat file.txt
iteration # 20 ecut= 36.00 Ry beta=0.10
Davidson diagonalization with overlap
ethr = 4.07E-13, avg # of iterations = 3.8
total cpu time spent up to now is 351441.3 secs
End of self-consistent calculation
Number of k-points >= 100: set verbosity='high' to print the bands.
highest occupied, lowest unoccupied level (ev): 2.2896 4.1062
$awk '/^iteration|highest/{if ([=11=]~"iteration") gsub(/ecut.*/,"",[=11=]); if ([=11=] ~ "highest") [=11=]=($(NF-1)-$NF); print}' < file.txt
iteration # 20
-1.8166
$
解释:
/^iteration|highest/ -- Select only rows starting with iteration or with highesh
([=12=] ~ "iteration") -- [=12=] means entire row, Check if row have iteration pattern
gsub(/ecut.*/ , "", [=12=]) -- Delete all char after **ecut**
NF -- > number of fields in row
$NF --> last field
$(NF-1) --> second last field
[=12=]=($(NF-1)-$NF -- Set current row as second last field - last field
我的文件很大,里面有很多不需要的东西,需要过滤掉。 所以我感兴趣的事情是这样的:
iteration # 20 ecut= 36.00 Ry beta=0.10
Davidson diagonalization with overlap
ethr = 4.07E-13, avg # of iterations = 3.8
total cpu time spent up to now is 351441.3 secs
End of self-consistent calculation
Number of k-points >= 100: set verbosity='high' to print the bands.
highest occupied, lowest unoccupied level (ev): 2.2896 4.1062
好吧,我需要计算:Bg=ELUMO−EHOMO,而ELUMO和EHOMO是最高和最低占用值。 问题是我想要这样的输出:
Iteration #<number>
Bg=xxx
我的2个问题: 1. 我可以通过 'highest' - 字符串进行 grep,这样我得到的每一行都像:
highest occupied, lowest unoccupied level (ev): 2.3005 4.0791
但是如何将变量设置为最高和最低未占用级别?
2.Since 并非每次迭代都会给我未占用级别的值(我想跳过它),我应该如何 grep/find 始终拥有迭代编号和未占用级别?
使用 awk
awk '/^iteration|highest/{if ([=10=] ~ "iteration") gsub(/ecut.*/ , "", [=10=]); if ([=10=] ~ "highest") [=10=]=($(NF-1)-$NF); print}'
演示:
$cat file.txt
iteration # 20 ecut= 36.00 Ry beta=0.10
Davidson diagonalization with overlap
ethr = 4.07E-13, avg # of iterations = 3.8
total cpu time spent up to now is 351441.3 secs
End of self-consistent calculation
Number of k-points >= 100: set verbosity='high' to print the bands.
highest occupied, lowest unoccupied level (ev): 2.2896 4.1062
$awk '/^iteration|highest/{if ([=11=]~"iteration") gsub(/ecut.*/,"",[=11=]); if ([=11=] ~ "highest") [=11=]=($(NF-1)-$NF); print}' < file.txt
iteration # 20
-1.8166
$
解释:
/^iteration|highest/ -- Select only rows starting with iteration or with highesh
([=12=] ~ "iteration") -- [=12=] means entire row, Check if row have iteration pattern
gsub(/ecut.*/ , "", [=12=]) -- Delete all char after **ecut**
NF -- > number of fields in row
$NF --> last field
$(NF-1) --> second last field
[=12=]=($(NF-1)-$NF -- Set current row as second last field - last field