使用多个参数进行解析 - Awk
parsing using multiple parameters - Awk
我在解析 GFF 文件时遇到问题。我将下面的代码用作单行代码。我正在获取基于第 1 列($1)过滤的输出,但是当我添加大于 5000 但小于 150000 的附加过滤器时,awk 没有正确过滤掉我的文件。我误解了一些东西,我不太确定它是什么。
awk '{ = "s10";
>= 50000 && <=150000;
print "\t""\t""\t""\t""\t"""\t""\t""\t"}' infile > outfile
输入
S03 GeneWise mRNA 7000 84000 40.00 - . ID=NA;Source=NA;Function="NA";
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
S11 GeneWise CDS 3700 15000 . + 0 Parent=NA;
S15 GeneWise mRNA 18055 25000 40.00 - . ID=S15;Source=NA;Function="NA";
我得到的输出有错误
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
预期输出
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
这是条件句的正确形式。但是,只有一条匹配记录:
$ awk '
== "S10" && >= 50000 && <=150000 {
print "\t""\t""\t""\t""\t""\t""\t""\t"
}' file
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
除非你想要记录 == "S10" || >= 50000 && <=150000
即。使用逻辑或),但这会带来一条额外的记录:
awk '
== "S10" || >= 50000 && <=150000 {
print "\t""\t""\t""\t""\t""\t""\t""\t"
}' file
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
第一个更好的形式:
$ awk '
BEGIN { OFS="\t" } # define OFS to \t
== "S10" && >= 50000 && <=150000 {
= # rebuild the record
print # output
}' file
我在解析 GFF 文件时遇到问题。我将下面的代码用作单行代码。我正在获取基于第 1 列($1)过滤的输出,但是当我添加大于 5000 但小于 150000 的附加过滤器时,awk 没有正确过滤掉我的文件。我误解了一些东西,我不太确定它是什么。
awk '{ = "s10";
>= 50000 && <=150000;
print "\t""\t""\t""\t""\t"""\t""\t""\t"}' infile > outfile
输入
S03 GeneWise mRNA 7000 84000 40.00 - . ID=NA;Source=NA;Function="NA";
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
S11 GeneWise CDS 3700 15000 . + 0 Parent=NA;
S15 GeneWise mRNA 18055 25000 40.00 - . ID=S15;Source=NA;Function="NA";
我得到的输出有错误
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
预期输出
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
这是条件句的正确形式。但是,只有一条匹配记录:
$ awk '
== "S10" && >= 50000 && <=150000 {
print "\t""\t""\t""\t""\t""\t""\t""\t"
}' file
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
除非你想要记录 == "S10" || >= 50000 && <=150000
即。使用逻辑或),但这会带来一条额外的记录:
awk '
== "S10" || >= 50000 && <=150000 {
print "\t""\t""\t""\t""\t""\t""\t""\t"
}' file
S07 GeneWise CDS 80450 96070 . - 0 Parent=NA;
S10 GeneWise mRNA 96000 105032 50.00 - . ID=NA;Source=NA;Function="NA";
S10 GeneWise CDS 43800 76000 . - 0 Parent=NA;
S10 GeneWise mRNA 175032 190540 41.11 + . ID=NA;Source=NA;Function="NA";
第一个更好的形式:
$ awk '
BEGIN { OFS="\t" } # define OFS to \t
== "S10" && >= 50000 && <=150000 {
= # rebuild the record
print # output
}' file