Bash 如何过滤输入文件

Question

我的文件很大，里面有很多不需要的东西，需要过滤掉。所以我感兴趣的事情是这样的：

 iteration # 20     ecut=    36.00 Ry     beta=0.10
 Davidson diagonalization with overlap
 ethr =  4.07E-13,  avg # of iterations =  3.8

 total cpu time spent up to now is   351441.3 secs

 End of self-consistent calculation

 Number of k-points >= 100: set verbosity='high' to print the bands.

 highest occupied, lowest unoccupied level (ev):     2.2896    4.1062

好吧，我需要计算：Bg=ELUMO−EHOMO，而ELUMO和EHOMO是最高和最低占用值。问题是我想要这样的输出：

Iteration #<number>
Bg=xxx

我的2个问题： 1. 我可以通过 'highest' - 字符串进行 grep，这样我得到的每一行都像：

 highest occupied, lowest unoccupied level (ev):     2.3005    4.0791

但是如何将变量设置为最高和最低未占用级别？

2.Since 并非每次迭代都会给我未占用级别的值（我想跳过它），我应该如何 grep/find 始终拥有迭代编号和未占用级别？

Answer 1

使用 awk

awk '/^iteration|highest/{if ([=10=] ~ "iteration") gsub(/ecut.*/ , "", [=10=]); if ([=10=] ~ "highest") [=10=]=($(NF-1)-$NF); print}'

演示：

$cat file.txt 
iteration # 20     ecut=    36.00 Ry     beta=0.10
 Davidson diagonalization with overlap
 ethr =  4.07E-13,  avg # of iterations =  3.8

 total cpu time spent up to now is   351441.3 secs

 End of self-consistent calculation

 Number of k-points >= 100: set verbosity='high' to print the bands.

 highest occupied, lowest unoccupied level (ev):     2.2896    4.1062
$awk '/^iteration|highest/{if ([=11=]~"iteration") gsub(/ecut.*/,"",[=11=]); if ([=11=] ~ "highest") [=11=]=($(NF-1)-$NF); print}'  < file.txt 
iteration # 20     
-1.8166
$

解释：

/^iteration|highest/ -- Select only rows starting with iteration or with highesh 
([=12=] ~ "iteration")  -- [=12=] means entire row, Check if row have iteration pattern
 gsub(/ecut.*/ , "", [=12=]) -- Delete all char after **ecut**
NF -- > number of fields in row 
$NF --> last field 
$(NF-1) --> second last field 
 [=12=]=($(NF-1)-$NF -- Set current row as second last field -  last field

Bash 如何过滤输入文件

Bash how to filter input file

linux

bash

physics