使用多个参数进行解析 - Awk

parsing using multiple parameters - Awk

我在解析 GFF 文件时遇到问题。我将下面的代码用作单行代码。我正在获取基于第 1 列($1)过滤的输出,但是当我添加大于 5000 但小于 150000 的附加过滤器时,awk 没有正确过滤掉我的文件。我误解了一些东西,我不太确定它是什么。

awk '{  = "s10"; 
 >= 50000 &&  <=150000; 
print "\t""\t""\t""\t""\t"""\t""\t""\t"}' infile > outfile 

输入

S03       GeneWise        mRNA    7000       84000     40.00   -       .       ID=NA;Source=NA;Function="NA";
S07       GeneWise        CDS     80450       96070     .       -       0       Parent=NA;
S10       GeneWise        mRNA    96000       105032     50.00   -       .       ID=NA;Source=NA;Function="NA";
S10       GeneWise        CDS     43800       76000     .       -       0       Parent=NA;
S10      GeneWise        mRNA    175032       190540     41.11   +       .       ID=NA;Source=NA;Function="NA";
S11       GeneWise        CDS     3700       15000     .       +       0       Parent=NA;
S15       GeneWise        mRNA    18055       25000     40.00   -       .       ID=S15;Source=NA;Function="NA";

我得到的输出有错误

S10       GeneWise        mRNA    96000       105032     50.00   -       .       ID=NA;Source=NA;Function="NA";
S10       GeneWise        CDS     43800       76000     .       -       0       Parent=NA;
S10      GeneWise        mRNA    175032       190540     41.11   +       .       ID=NA;Source=NA;Function="NA";

预期输出

S10       GeneWise        mRNA    96000       105032     50.00   -       .       ID=NA;Source=NA;Function="NA";

这是条件句的正确形式。但是,只有一条匹配记录:

$ awk ' 
 == "S10" &&  >= 50000 &&  <=150000 { 
    print "\t""\t""\t""\t""\t""\t""\t""\t"
}' file
S10     GeneWise        mRNA    96000   105032  50.00   -       .       ID=NA;Source=NA;Function="NA";

除非你想要记录 == "S10" || >= 50000 && <=150000 即。使用逻辑或),但这会带来一条额外的记录:

awk ' 
 == "S10" ||  >= 50000 &&  <=150000 { 
    print "\t""\t""\t""\t""\t""\t""\t""\t"
}' file
S07     GeneWise        CDS     80450   96070   .       -       0       Parent=NA;
S10     GeneWise        mRNA    96000   105032  50.00   -       .       ID=NA;Source=NA;Function="NA";
S10     GeneWise        CDS     43800   76000   .       -       0       Parent=NA;
S10     GeneWise        mRNA    175032  190540  41.11   +       .       ID=NA;Source=NA;Function="NA";

第一个更好的形式:

$ awk ' 
BEGIN { OFS="\t" }                           # define OFS to \t
 == "S10" &&  >= 50000 &&  <=150000 { 
    =                                    # rebuild the record
    print                                    # output
}' file