如何提取模式但在 bash 中填充缺失值?
How to extract a pattern but fill missing values in bash?
我有一个很大的制表符分隔文件 (dummy.vcf),其中有一列是“;”定界变量。例如:
AF_female=0.00000e+00;non_topmed_AF_female=0.00000e+00;control_AF_female=0.00000e+00
control_AF_female=0.00000e+00;non_topmed_AF_female=0.00000e+00
AF_female=0.00008e+00;non_topmed_AF_female=0.00000e+00
我想为每一行提取 "AF_female=X" 字符串并填充缺失值,因此新文件的长度与原始文件相同。例如:
AF_female=0.00000e+00
NA
AF_female=0.00008e+00
我试过:
grep -o ';AF_female=[0-9].[0-9]*..[0-9]*' dummy.vcf
但是,当模式不匹配时,这不会添加行。
非常感谢任何帮助!
如果您对 awk
满意,请尝试关注一下。
awk -F';' '
{
val=""
for(i=1;i<=NF;i++){
if($i ~ /^AF_female=[0-9]+/){
val=(val?val OFS $i:$i)
}
}
if(val){
print val
}
else{
print "NA"
}
}' Input_file
它应该检查一行中 AF_female=digits
的所有当前值,并打印 NA
以防它在一行中也找到 NULL 匹配项。
输出如下。
AF_female=0.00000e+00
NA
AF_female=0.00008e+00
说明: 现在为上述命令添加说明。
awk -F';' ' ##Starting awk program here and setting up field separator as semi-colon here.
{
val="" ##Nullifying value of variable val here.
for(i=1;i<=NF;i++){ ##using a for loop which starts from i=1 to i=NF value. Where NF is number of fields value in current line.
if($i ~ /^AF_female=[0-9]+/){ ##Checking condition if a field starts from AF_female and then digits then do following.
val=(val?val OFS $i:$i) ##Creating variable val whose value is current field value and concatenating its own value.
}
}
if(val!=""){ ##After coming out of for loop checking if variable val value is NOT NULL then do following.
print val ##Printing value of variable val here.
}
else{ ##Mentioning else of above if condition here.
print "NA" ##Printing NA here.
}
}' Input_file ##Mentioning Input_file name here.
我有一个很大的制表符分隔文件 (dummy.vcf),其中有一列是“;”定界变量。例如:
AF_female=0.00000e+00;non_topmed_AF_female=0.00000e+00;control_AF_female=0.00000e+00
control_AF_female=0.00000e+00;non_topmed_AF_female=0.00000e+00
AF_female=0.00008e+00;non_topmed_AF_female=0.00000e+00
我想为每一行提取 "AF_female=X" 字符串并填充缺失值,因此新文件的长度与原始文件相同。例如:
AF_female=0.00000e+00
NA
AF_female=0.00008e+00
我试过:
grep -o ';AF_female=[0-9].[0-9]*..[0-9]*' dummy.vcf
但是,当模式不匹配时,这不会添加行。
非常感谢任何帮助!
如果您对 awk
满意,请尝试关注一下。
awk -F';' '
{
val=""
for(i=1;i<=NF;i++){
if($i ~ /^AF_female=[0-9]+/){
val=(val?val OFS $i:$i)
}
}
if(val){
print val
}
else{
print "NA"
}
}' Input_file
它应该检查一行中 AF_female=digits
的所有当前值,并打印 NA
以防它在一行中也找到 NULL 匹配项。
输出如下。
AF_female=0.00000e+00
NA
AF_female=0.00008e+00
说明: 现在为上述命令添加说明。
awk -F';' ' ##Starting awk program here and setting up field separator as semi-colon here.
{
val="" ##Nullifying value of variable val here.
for(i=1;i<=NF;i++){ ##using a for loop which starts from i=1 to i=NF value. Where NF is number of fields value in current line.
if($i ~ /^AF_female=[0-9]+/){ ##Checking condition if a field starts from AF_female and then digits then do following.
val=(val?val OFS $i:$i) ##Creating variable val whose value is current field value and concatenating its own value.
}
}
if(val!=""){ ##After coming out of for loop checking if variable val value is NOT NULL then do following.
print val ##Printing value of variable val here.
}
else{ ##Mentioning else of above if condition here.
print "NA" ##Printing NA here.
}
}' Input_file ##Mentioning Input_file name here.