如何提取模式但在 bash 中填充缺失值?

How to extract a pattern but fill missing values in bash?

我有一个很大的制表符分隔文件 (dummy.vcf),其中有一列是“;”定界变量。例如:

AF_female=0.00000e+00;non_topmed_AF_female=0.00000e+00;control_AF_female=0.00000e+00
control_AF_female=0.00000e+00;non_topmed_AF_female=0.00000e+00
AF_female=0.00008e+00;non_topmed_AF_female=0.00000e+00

我想为每一行提取 "AF_female=X" 字符串并填充缺失值,因此新文件的长度与原始文件相同。例如:

AF_female=0.00000e+00  
NA  
AF_female=0.00008e+00 

我试过:

grep -o ';AF_female=[0-9].[0-9]*..[0-9]*' dummy.vcf

但是,当模式不匹配时,这不会添加行。

非常感谢任何帮助!

如果您对 awk 满意,请尝试关注一下。

awk -F';' '
{
  val=""
  for(i=1;i<=NF;i++){
     if($i ~ /^AF_female=[0-9]+/){
         val=(val?val OFS $i:$i)
     }
  }
  if(val){
     print val
  }
  else{
     print "NA"
  }
}'  Input_file

它应该检查一行中 AF_female=digits 的所有当前值,并打印 NA 以防它在一行中也找到 NULL 匹配项。

输出如下。

AF_female=0.00000e+00
NA
AF_female=0.00008e+00

说明: 现在为上述命令添加说明。

awk -F';' '                           ##Starting awk program here and setting up field separator as semi-colon here.
{
  val=""                              ##Nullifying value of variable val here.
  for(i=1;i<=NF;i++){                 ##using a for loop which starts from i=1 to i=NF value. Where NF is number of fields value in current line.
     if($i ~ /^AF_female=[0-9]+/){    ##Checking condition if a field starts from AF_female and then digits then do following.
         val=(val?val OFS $i:$i)      ##Creating variable val whose value is current field value and concatenating its own value.
     }
  }
  if(val!=""){                        ##After coming out of for loop checking if variable val value is NOT NULL then do following.
     print val                        ##Printing value of variable val here.
  }
  else{                               ##Mentioning else of above if condition here.
     print "NA"                       ##Printing NA here.
  }
}' Input_file                         ##Mentioning Input_file name here.