Bash 如何过滤输入文件

Bash how to filter input file

我的文件很大,里面有很多不需要的东西,需要过滤掉。 所以我感兴趣的事情是这样的:

 iteration # 20     ecut=    36.00 Ry     beta=0.10
 Davidson diagonalization with overlap
 ethr =  4.07E-13,  avg # of iterations =  3.8

 total cpu time spent up to now is   351441.3 secs

 End of self-consistent calculation

 Number of k-points >= 100: set verbosity='high' to print the bands.

 highest occupied, lowest unoccupied level (ev):     2.2896    4.1062

好吧,我需要计算:Bg=ELUMO−EHOMO,而ELUMO和EHOMO是最高和最低占用值。 问题是我想要这样的输出:

Iteration #<number>
Bg=xxx

我的2个问题: 1. 我可以通过 'highest' - 字符串进行 grep,这样我得到的每一行都像:

 highest occupied, lowest unoccupied level (ev):     2.3005    4.0791

但是如何将变量设置为最高和最低未占用级别?

2.Since 并非每次迭代都会给我未占用级别的值(我想跳过它),我应该如何 grep/find 始终拥有迭代编号和未占用级别?

使用 awk

awk '/^iteration|highest/{if ([=10=] ~ "iteration") gsub(/ecut.*/ , "", [=10=]); if ([=10=] ~ "highest") [=10=]=($(NF-1)-$NF); print}' 

演示:

$cat file.txt 
iteration # 20     ecut=    36.00 Ry     beta=0.10
 Davidson diagonalization with overlap
 ethr =  4.07E-13,  avg # of iterations =  3.8

 total cpu time spent up to now is   351441.3 secs

 End of self-consistent calculation

 Number of k-points >= 100: set verbosity='high' to print the bands.

 highest occupied, lowest unoccupied level (ev):     2.2896    4.1062
$awk '/^iteration|highest/{if ([=11=]~"iteration") gsub(/ecut.*/,"",[=11=]); if ([=11=] ~ "highest") [=11=]=($(NF-1)-$NF); print}'  < file.txt 
iteration # 20     
-1.8166
$

解释:

/^iteration|highest/ -- Select only rows starting with iteration or with highesh 
([=12=] ~ "iteration")  -- [=12=] means entire row, Check if row have iteration pattern
 gsub(/ecut.*/ , "", [=12=]) -- Delete all char after **ecut**
NF -- > number of fields in row 
$NF --> last field 
$(NF-1) --> second last field 
 [=12=]=($(NF-1)-$NF -- Set current row as second last field -  last field