grep 如果单词值大于值
grep if word value greater than value
我这样归档:
1 51710 . C A . clustered_events;contamination;germline_risk;read_position;t_lod DP=1;ECNT=6;POP_AF=1.000e-03;P_GERMLINE=-1.372e-02;TLOD=4.20 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:0,1:1.000:1:0,0:0,1:26:0,136:43:2:0|1:51637_C_T:0.990,0.00,1.00:0.025,0.028,0.947
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
20 42199704 . GGT G,GGTGGGTGGGTGTGTGT . germline_risk DP=100;ECNT=2;POP_AF=0.112,0.024;P_GERMLINE=-2.964e-04,-8.826e-06;TLOD=3.76,9.83 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:1,2,7:0.168,0.301:20:1,1,4:9,1,1:34,35:147,203,146:60,60:51,62:0.192,0.253,0.263:0.038,0.014,0.948
我想分两步 grep 行:
具有DP > 45
的行。然后,在最后一列中第一个 :
之后具有值的行 > 2
因此,在第一行中,我们可以看到 DP = 1,最后一列中 : 之后的第一个值 = 0
在第二行中,DP = 60,最后一列中 : 之后的第一个值 = 5
从上面的示例输入文件中,首先我们应该得到:
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
20 42199704 . GGT G,GGTGGGTGGGTGTGTGT . germline_risk DP=100;ECNT=2;POP_AF=0.112,0.024;P_GERMLINE=-2.964e-04,-8.826e-06;TLOD=3.76,9.83 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:1,2,7:0.168,0.301:20:1,1,4:9,1,1:34,35:147,203,146:60,60:51,62:0.192,0.253,0.263:0.038,0.014,0.948
第二次之后我们应该得到:
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
92,0.253,0.263:0.038,0.014,0.948
有什么帮助吗?
如果你坚持grep,你可以通过
得到DP > 45
grep 'DP=\(4[6-9]\|[5-9][0-9]\|[1-9][0-9]\{2,\}\)[^0-9]'
# | | |
# 46-49 | 100..∞
# 50-99
grep 是尝试比较数字以查看它们是大于还是小于的错误工具。
他是一个 perl 单行程序,打印符合两个条件的行:
perl -ane 'print if $F[7] =~ /DP=(\d+)/ && > 45 && $F[9] =~ /:(\d+)/ && > 2' input.txt
能否请您尝试以下。
awk '
{
split(,array,"[;=]")
if(array[1]=="DP" && array[2]>45){
split(,array1,"[:,]")
if(array1[2]>2){
print
}
}
}' Input_file
说明:现在为上面的代码添加说明。
awk ' ##Starting awk program here.
{ ##Starting block for statements here.
split(,array,"[;=]") ##Using awk out of box function split for splitting 8th field and saving it to array with delimiter ;=
if(array[1]=="DP" && array[2]>45){ ##Checking condition if 1st element of array is DP and 2nd element value is greater than 45 then:
split(,array1,"[:,]") ##Splitting 10th field to array1 with delkimter : and , here.
if(array1[2]>2){ ##Checking condition if array1 2nd element if its value is greater than 2 then do following.
print ##Printing the current line value here.
} ##Closing block for above if condition here.
} ##Closing block for previous if condition here.
}' Input_file ##Mentioning Input_file name here.
使用 GNU awk 匹配第三个参数():
$ awk 'match([=10=],/ DP=([^;]+).* [^:]+:([^,]+)/,a) && (a[1] > 45) && (a[2] > 2)' file
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
使用正确的工具完成工作,请参阅 "bcftools view" 选项了解更多信息,像这样的东西应该有用:
bcftools view -i 'INFO/DP > 45 & FORMAT/AD[0:0] > 2' myFile.vcf
来自 bcftools manuals 的更多选项:
INFO/AF[0] > 0.3 .. first AF value bigger than 0.3
FORMAT/AD[0:0] > 30 .. first AD value of the first sample bigger than 30
FORMAT/AD[0:1] .. first sample, second AD value
FORMAT/AD[1:0] .. second sample, first AD value
DP4[*] == 0 .. any DP4 value
FORMAT/DP[0] > 30 .. DP of the first sample bigger than 30
FORMAT/DP[1-3] > 10 .. samples 2-4
FORMAT/DP[1-] < 7 .. all samples but the first
FORMAT/DP[0,2-4] > 20 .. samples 1, 3-5
FORMAT/AD[0:1] .. first sample, second AD field
FORMAT/AD[0:*], AD[0:] or AD[0] .. first sample, any AD field
FORMAT/AD[*:1] or AD[:1] .. any sample, second AD field
(DP4[0]+DP4[1])/(DP4[2]+DP4[3]) > 0.3
CSQ[*] ~ "missense_variant.*deleterious"
我这样归档:
1 51710 . C A . clustered_events;contamination;germline_risk;read_position;t_lod DP=1;ECNT=6;POP_AF=1.000e-03;P_GERMLINE=-1.372e-02;TLOD=4.20 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:0,1:1.000:1:0,0:0,1:26:0,136:43:2:0|1:51637_C_T:0.990,0.00,1.00:0.025,0.028,0.947
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
20 42199704 . GGT G,GGTGGGTGGGTGTGTGT . germline_risk DP=100;ECNT=2;POP_AF=0.112,0.024;P_GERMLINE=-2.964e-04,-8.826e-06;TLOD=3.76,9.83 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:1,2,7:0.168,0.301:20:1,1,4:9,1,1:34,35:147,203,146:60,60:51,62:0.192,0.253,0.263:0.038,0.014,0.948
我想分两步 grep 行:
具有DP > 45
的行。然后,在最后一列中第一个 :
之后具有值的行 > 2
因此,在第一行中,我们可以看到 DP = 1,最后一列中 : 之后的第一个值 = 0
在第二行中,DP = 60,最后一列中 : 之后的第一个值 = 5
从上面的示例输入文件中,首先我们应该得到:
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
20 42199704 . GGT G,GGTGGGTGGGTGTGTGT . germline_risk DP=100;ECNT=2;POP_AF=0.112,0.024;P_GERMLINE=-2.964e-04,-8.826e-06;TLOD=3.76,9.83 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:1,2,7:0.168,0.301:20:1,1,4:9,1,1:34,35:147,203,146:60,60:51,62:0.192,0.253,0.263:0.038,0.014,0.948
第二次之后我们应该得到:
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
92,0.253,0.263:0.038,0.014,0.948
有什么帮助吗?
如果你坚持grep,你可以通过
得到DP > 45grep 'DP=\(4[6-9]\|[5-9][0-9]\|[1-9][0-9]\{2,\}\)[^0-9]'
# | | |
# 46-49 | 100..∞
# 50-99
grep 是尝试比较数字以查看它们是大于还是小于的错误工具。
他是一个 perl 单行程序,打印符合两个条件的行:
perl -ane 'print if $F[7] =~ /DP=(\d+)/ && > 45 && $F[9] =~ /:(\d+)/ && > 2' input.txt
能否请您尝试以下。
awk '
{
split(,array,"[;=]")
if(array[1]=="DP" && array[2]>45){
split(,array1,"[:,]")
if(array1[2]>2){
print
}
}
}' Input_file
说明:现在为上面的代码添加说明。
awk ' ##Starting awk program here.
{ ##Starting block for statements here.
split(,array,"[;=]") ##Using awk out of box function split for splitting 8th field and saving it to array with delimiter ;=
if(array[1]=="DP" && array[2]>45){ ##Checking condition if 1st element of array is DP and 2nd element value is greater than 45 then:
split(,array1,"[:,]") ##Splitting 10th field to array1 with delkimter : and , here.
if(array1[2]>2){ ##Checking condition if array1 2nd element if its value is greater than 2 then do following.
print ##Printing the current line value here.
} ##Closing block for above if condition here.
} ##Closing block for previous if condition here.
}' Input_file ##Mentioning Input_file name here.
使用 GNU awk 匹配第三个参数():
$ awk 'match([=10=],/ DP=([^;]+).* [^:]+:([^,]+)/,a) && (a[1] > 45) && (a[2] > 2)' file
19 27733067 . A G,C . clustered_events;contamination;germline_risk;multiallelic DP=60;ECNT=15;POP_AF=1.000e-03,1.000e-03;P_GERMLINE=-2.169e-04,-2.325e-04;TLOD=11.46,7.14 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1/2:5,35,20:0.500,0.333:6:0,2,1:1,1,1:34,35:112,143,117:42,45:29,47:0.444,0.485,0.500:0.037,0.019,0.944
使用正确的工具完成工作,请参阅 "bcftools view" 选项了解更多信息,像这样的东西应该有用:
bcftools view -i 'INFO/DP > 45 & FORMAT/AD[0:0] > 2' myFile.vcf
来自 bcftools manuals 的更多选项:
INFO/AF[0] > 0.3 .. first AF value bigger than 0.3 FORMAT/AD[0:0] > 30 .. first AD value of the first sample bigger than 30 FORMAT/AD[0:1] .. first sample, second AD value FORMAT/AD[1:0] .. second sample, first AD value DP4[*] == 0 .. any DP4 value FORMAT/DP[0] > 30 .. DP of the first sample bigger than 30 FORMAT/DP[1-3] > 10 .. samples 2-4 FORMAT/DP[1-] < 7 .. all samples but the first FORMAT/DP[0,2-4] > 20 .. samples 1, 3-5 FORMAT/AD[0:1] .. first sample, second AD field FORMAT/AD[0:*], AD[0:] or AD[0] .. first sample, any AD field FORMAT/AD[*:1] or AD[:1] .. any sample, second AD field (DP4[0]+DP4[1])/(DP4[2]+DP4[3]) > 0.3 CSQ[*] ~ "missense_variant.*deleterious"