Ubuntu 中的 Bash 脚本错误:awk:第 1 行:正则表达式超出了实现大小限制
Bashscript error in Ubuntu: awk: line 1: regular expression exceeds implementation size limit
我正在尝试将此代码应用到 snpEff 生成的注释文件中:
(我的 OS 是 Ubuntu)
grep -v '^##' /home/zee/fdr_vs_wt.snp.annotated.vcf | awk 'BEGIN{FS=" "; OFS=" "} ~/SL2.50chch/ || ~/^1\/1/ && (~/^1\/0/ || ~/^0\/0/ || ~/^0\/1/) && ~/^[0-9X]*$/ && /splice_acceptor_variant|splice_donor_variant|splice_region_variant|stop_lost|start_lost|stop_gained|missense_variant|coding_sequence_variant|inframe_insertion|disruptive_inframe_insertion|inframe_deletion|disruptive_inframe_deletion|exon_variant|exon_loss_variant|exon_loss_variant|duplication|inversion|frameshift_variant|feature_ablation|duplication|gene_fusion|bidirectional_gene_fusion|rearranged_at_DNA_level|miRNA|initiator_codon_variant|start_retained/ {==""; print [=11=]}' | sed 's/ */ /g' | awk '{split(,a,":"); split(a[2],b,","); if (b[1]>b[2] || ~/SL2.50ch/) print [=11=]}' > /home/zee/fdr_vs_wt.raw.vcfmutantbulk.cands2.txt
我收到以下错误:
awk: line 1: regular expression /splice_acc ... exceeds implementation size limit
有人可以帮忙吗?我知道这个问题前一段时间被另一个人问过,但我技术不强,我没有理解给出的解决方案。提前致谢。
我也打算稍后在我的 Java GUI 中使用此代码,我将使用 ProcessBuilder 运行 它和以下代码:
speciesFastaVersionCH = "SL2.50";
String longInputcmd4b = "ch/ || ~/^1\/1/ && (~/^1\/0/ || ~/^0\/0/ || ~/^0\/1/) && ~/^[0-9X]*$/ && /splice_acceptor_variant|splice_donor_variant|splice_region_variant|stop_lost|start_lost|stop_gained|missense_variant|coding_sequence_variant|inframe_insertion|disruptive_inframe_insertion|inframe_deletion|disruptive_inframe_deletion|exon_variant|exon_loss_variant|exon_loss_variant|duplication|inversion|frameshift_variant|feature_ablation|duplication|gene_fusion|bidirectional_gene_fusion|rearranged_at_DNA_level|miRNA|initiator_codon_variant|start_retained/ {==\"\"; print [=13=]}' | sed 's/ */ /g' | awk '{split(,a,\":\"); split(a[2],b,\",\"); if (b[1]>b[2] || ~/";
StringBuilder cmd4 = new StringBuilder().append("\"").append("grep -v '^##' ").append(outputFilecmd3).append(" | awk 'BEGIN{FS=\" \"; OFS=\" \"} ~/").append(speciesFastaVersionCH).append(longInputcmd4b).append(speciesFastaVersionCH).append("ch/) print [=13=]}' > ").append(outputFilecmd5).append("\"");
System.out.println("Here is cmd4:" + cmd4.toString());
String [] gatkArray1 = cmd1.split(" ");
String [] gatkArray2 = cmd2.split(" ");
String [] gatkArray3 = {"bash", "-c", cmd3};
String [][] gatkArrays = {gatkArray1, gatkArray2, gatkArray3};
ProcessBuilder pb = new ProcessBuilder(gatkArray3);
pb.redirectOutput(ProcessBuilder.Redirect.INHERIT);
pb.redirectError(ProcessBuilder.Redirect.INHERIT);
Process p = pb.start();
您的 awk
实现不支持该长度的正则表达式。
具体来说,您使用的是 mawk
,其中最大正则表达式限制为 400,包括 //
:
$ true | mawk "/$(printf '%397s')/"
(no output)
$ true | mawk "/$(printf '%398s')/"
mawk: line 1: regular expression / ... exceeds implementation size limit
您可以重写 awk 脚本以使用更短的正则表达式文字(POSIX 保证的最大大小是 256 bytes),或者切换到像 gawk
这样的实现,其中唯一limit 是 Linux 的最大参数大小 128KiB:
$ true | gawk "/$(printf '%131069s')/"
(no output)
$ true | gawk "/$(printf '%131070s')/"
bash: /usr/bin/gawk: Argument list too long
我正在尝试将此代码应用到 snpEff 生成的注释文件中: (我的 OS 是 Ubuntu)
grep -v '^##' /home/zee/fdr_vs_wt.snp.annotated.vcf | awk 'BEGIN{FS=" "; OFS=" "} ~/SL2.50chch/ || ~/^1\/1/ && (~/^1\/0/ || ~/^0\/0/ || ~/^0\/1/) && ~/^[0-9X]*$/ && /splice_acceptor_variant|splice_donor_variant|splice_region_variant|stop_lost|start_lost|stop_gained|missense_variant|coding_sequence_variant|inframe_insertion|disruptive_inframe_insertion|inframe_deletion|disruptive_inframe_deletion|exon_variant|exon_loss_variant|exon_loss_variant|duplication|inversion|frameshift_variant|feature_ablation|duplication|gene_fusion|bidirectional_gene_fusion|rearranged_at_DNA_level|miRNA|initiator_codon_variant|start_retained/ {==""; print [=11=]}' | sed 's/ */ /g' | awk '{split(,a,":"); split(a[2],b,","); if (b[1]>b[2] || ~/SL2.50ch/) print [=11=]}' > /home/zee/fdr_vs_wt.raw.vcfmutantbulk.cands2.txt
我收到以下错误:
awk: line 1: regular expression /splice_acc ... exceeds implementation size limit
有人可以帮忙吗?我知道这个问题前一段时间被另一个人问过,但我技术不强,我没有理解给出的解决方案。提前致谢。
我也打算稍后在我的 Java GUI 中使用此代码,我将使用 ProcessBuilder 运行 它和以下代码:
speciesFastaVersionCH = "SL2.50";
String longInputcmd4b = "ch/ || ~/^1\/1/ && (~/^1\/0/ || ~/^0\/0/ || ~/^0\/1/) && ~/^[0-9X]*$/ && /splice_acceptor_variant|splice_donor_variant|splice_region_variant|stop_lost|start_lost|stop_gained|missense_variant|coding_sequence_variant|inframe_insertion|disruptive_inframe_insertion|inframe_deletion|disruptive_inframe_deletion|exon_variant|exon_loss_variant|exon_loss_variant|duplication|inversion|frameshift_variant|feature_ablation|duplication|gene_fusion|bidirectional_gene_fusion|rearranged_at_DNA_level|miRNA|initiator_codon_variant|start_retained/ {==\"\"; print [=13=]}' | sed 's/ */ /g' | awk '{split(,a,\":\"); split(a[2],b,\",\"); if (b[1]>b[2] || ~/";
StringBuilder cmd4 = new StringBuilder().append("\"").append("grep -v '^##' ").append(outputFilecmd3).append(" | awk 'BEGIN{FS=\" \"; OFS=\" \"} ~/").append(speciesFastaVersionCH).append(longInputcmd4b).append(speciesFastaVersionCH).append("ch/) print [=13=]}' > ").append(outputFilecmd5).append("\"");
System.out.println("Here is cmd4:" + cmd4.toString());
String [] gatkArray1 = cmd1.split(" ");
String [] gatkArray2 = cmd2.split(" ");
String [] gatkArray3 = {"bash", "-c", cmd3};
String [][] gatkArrays = {gatkArray1, gatkArray2, gatkArray3};
ProcessBuilder pb = new ProcessBuilder(gatkArray3);
pb.redirectOutput(ProcessBuilder.Redirect.INHERIT);
pb.redirectError(ProcessBuilder.Redirect.INHERIT);
Process p = pb.start();
您的 awk
实现不支持该长度的正则表达式。
具体来说,您使用的是 mawk
,其中最大正则表达式限制为 400,包括 //
:
$ true | mawk "/$(printf '%397s')/"
(no output)
$ true | mawk "/$(printf '%398s')/"
mawk: line 1: regular expression / ... exceeds implementation size limit
您可以重写 awk 脚本以使用更短的正则表达式文字(POSIX 保证的最大大小是 256 bytes),或者切换到像 gawk
这样的实现,其中唯一limit 是 Linux 的最大参数大小 128KiB:
$ true | gawk "/$(printf '%131069s')/"
(no output)
$ true | gawk "/$(printf '%131070s')/"
bash: /usr/bin/gawk: Argument list too long