使用 snakemake touch 使用 bwa 和 gatk 创建参考基因组的索引
Create index of a reference genome with bwa and gatk using snakemake touch
我用 bwa
对齐读取,用 gatk
调用变体。 gatk
需要为参考基因组创建 dict
,bwa
需要创建索引。当我对它们都使用触摸时,出现此错误:
AmbiguousRuleException:
Rules bwa_index and gatk_refdict are ambiguous for the file ref.
Expected input files:
bwa_index: ref.fasta
gatk_refdict: ref.fasta
这是代码:
rule bwa_index:
input:
database="ref.fasta"
output:
done =touch("ref")
shell:
"""
bwa index -p ref {input.database}
"""
rule bwa_mem:
input:
bwa_index_done = "ref",
fastq1="{sample}_R1.trimmed.fastq.gz",
fastq2="{sample}_R2.trimmed.fastq.gz"
output:
bam = temp("{sample}.bam")
shell:
"""
bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
"""
rule_gatk_refdict:
input:
ref="ref.fasta"
output:
done =touch("ref")
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
"""
rule gatk:
input:
gatk_refdict_done = "ref",
bam="bam_list"
output:
outf ="{chr}.vcf"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R ref -I {input.bam} --min-base-quality-score 20 -O {output.outf}
"""
或者,我指定了索引 .dict
,但它也不起作用,因为 gatk 在创建 dict 之前调用变体,所以我得到一个没有 dict 文件的错误:
rule_gatk_refdict:
input:
ref="ref.fasta"
output:
outf ="ref.dict"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.outf}
"""
rule gatk:
input:
ref = "ref.fasta",
bam="bam_list"
output:
outf ="{chr}.vcf"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
"""
如何解决?
为什么不简单地将 dict
文件定义为 gatk 规则的输入,将 ìndex 定义为 bwa 规则的输入?
rule bwa_index:
input:
database="ref.fasta"
output:
done =touch("ref")
shell:
"""
bwa index -p ref {input.database}
"""
rule bwa_mem:
input:
bwa_index_done = "ref",
fastq1="{sample}_R1.trimmed.fastq.gz",
fastq2="{sample}_R2.trimmed.fastq.gz"
output:
bam = temp("{sample}.bam")
shell:
"""
bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
"""
rule gatk_refdict:
input:
ref="ref.fasta"
output:
done = "ref.dict"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
"""
rule gatk:
input:
ref = "ref.fasta",
dict = "ref.dict",
bam="bam_list"
output:
outf ="{chr}.vcf"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
"""
你得到的 AmbiguousRuleException
是因为 snakemake 不知道 运行 哪个规则,因为两个规则有相同的输出。不要忘记 snakemake 尝试从规则 all 开始构建 DAG。当涉及到 运行 rule gatk
时,您将 "ref"
定义为输入。由于两个规则可以生成这个文件,snakemake 不知道它必须使用 rule gatk_refdict
还是 rule bwa_index
.
你有一个错字(“_”不应该在那里):
----v
rule_gatk_refdict:
input:
ref="ref.fasta"
...
我用 bwa
对齐读取,用 gatk
调用变体。 gatk
需要为参考基因组创建 dict
,bwa
需要创建索引。当我对它们都使用触摸时,出现此错误:
AmbiguousRuleException:
Rules bwa_index and gatk_refdict are ambiguous for the file ref.
Expected input files:
bwa_index: ref.fasta
gatk_refdict: ref.fasta
这是代码:
rule bwa_index:
input:
database="ref.fasta"
output:
done =touch("ref")
shell:
"""
bwa index -p ref {input.database}
"""
rule bwa_mem:
input:
bwa_index_done = "ref",
fastq1="{sample}_R1.trimmed.fastq.gz",
fastq2="{sample}_R2.trimmed.fastq.gz"
output:
bam = temp("{sample}.bam")
shell:
"""
bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
"""
rule_gatk_refdict:
input:
ref="ref.fasta"
output:
done =touch("ref")
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
"""
rule gatk:
input:
gatk_refdict_done = "ref",
bam="bam_list"
output:
outf ="{chr}.vcf"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R ref -I {input.bam} --min-base-quality-score 20 -O {output.outf}
"""
或者,我指定了索引 .dict
,但它也不起作用,因为 gatk 在创建 dict 之前调用变体,所以我得到一个没有 dict 文件的错误:
rule_gatk_refdict:
input:
ref="ref.fasta"
output:
outf ="ref.dict"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.outf}
"""
rule gatk:
input:
ref = "ref.fasta",
bam="bam_list"
output:
outf ="{chr}.vcf"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
"""
如何解决?
为什么不简单地将 dict
文件定义为 gatk 规则的输入,将 ìndex 定义为 bwa 规则的输入?
rule bwa_index:
input:
database="ref.fasta"
output:
done =touch("ref")
shell:
"""
bwa index -p ref {input.database}
"""
rule bwa_mem:
input:
bwa_index_done = "ref",
fastq1="{sample}_R1.trimmed.fastq.gz",
fastq2="{sample}_R2.trimmed.fastq.gz"
output:
bam = temp("{sample}.bam")
shell:
"""
bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
"""
rule gatk_refdict:
input:
ref="ref.fasta"
output:
done = "ref.dict"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
"""
rule gatk:
input:
ref = "ref.fasta",
dict = "ref.dict",
bam="bam_list"
output:
outf ="{chr}.vcf"
shell:
"""
java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
"""
你得到的 AmbiguousRuleException
是因为 snakemake 不知道 运行 哪个规则,因为两个规则有相同的输出。不要忘记 snakemake 尝试从规则 all 开始构建 DAG。当涉及到 运行 rule gatk
时,您将 "ref"
定义为输入。由于两个规则可以生成这个文件,snakemake 不知道它必须使用 rule gatk_refdict
还是 rule bwa_index
.
你有一个错字(“_”不应该在那里):
----v
rule_gatk_refdict:
input:
ref="ref.fasta"
...