使用 snakemake touch 使用 bwa 和 gatk 创建参考基因组的索引

Create index of a reference genome with bwa and gatk using snakemake touch

我用 bwa 对齐读取,用 gatk 调用变体。 gatk 需要为参考基因组创建 dictbwa 需要创建索引。当我对它们都使用触摸时,出现此错误:

AmbiguousRuleException:
Rules bwa_index and gatk_refdict are ambiguous for the file ref.
Expected input files:
        bwa_index: ref.fasta
        gatk_refdict: ref.fasta

这是代码:

rule bwa_index:
    input:
        database="ref.fasta"
    output:
        done =touch("ref")
    shell:
        """
        bwa index -p ref {input.database}
        """
rule bwa_mem:
    input:
        bwa_index_done = "ref",
        fastq1="{sample}_R1.trimmed.fastq.gz",
        fastq2="{sample}_R2.trimmed.fastq.gz"
    output:
        bam = temp("{sample}.bam")
    shell:
        """
        bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
        """
rule_gatk_refdict:
    input:
        ref="ref.fasta"
    output:
        done =touch("ref")
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
        """
rule gatk:
    input:
        gatk_refdict_done = "ref",
        bam="bam_list"
    output:
        outf ="{chr}.vcf"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R ref -I {input.bam} --min-base-quality-score 20 -O {output.outf}
        """

或者,我指定了索引 .dict,但它也不起作用,因为 gatk 在创建 dict 之前调用变体,所以我得到一个没有 dict 文件的错误:

rule_gatk_refdict:
    input:
        ref="ref.fasta"
    output:
        outf ="ref.dict"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.outf}
        """
rule gatk:
    input:
        ref = "ref.fasta",
        bam="bam_list"
    output:
        outf ="{chr}.vcf"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
        """

如何解决?

为什么不简单地将 dict 文件定义为 gatk 规则的输入,将 ìndex 定义为 bwa 规则的输入?

rule bwa_index:
    input:
        database="ref.fasta"
    output:
        done =touch("ref")
    shell:
        """
        bwa index -p ref {input.database}
        """
rule bwa_mem:
    input:
        bwa_index_done = "ref",
        fastq1="{sample}_R1.trimmed.fastq.gz",
        fastq2="{sample}_R2.trimmed.fastq.gz"
    output:
        bam = temp("{sample}.bam")
    shell:
        """
        bwa mem ref {input.fastq1} {input.fastq2} -o {output.bam}
        """
rule gatk_refdict:
    input:
        ref="ref.fasta"
    output:
        done = "ref.dict"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar CreateSequenceDictionary -R {input.ref} -O {output.done}
        """
rule gatk:
    input:
        ref = "ref.fasta",
        dict = "ref.dict",
        bam="bam_list"
    output:
        outf ="{chr}.vcf"
    shell:
        """
        java -jar gatk-package-4.1.9.0-local.jar HaplotypeCaller -L {wildcards.chr} -R {input.ref} -I {input.bam} --min-base-quality-score 20 -O {output.outf}
        """

你得到的 AmbiguousRuleException 是因为 snakemake 不知道 运行 哪个规则,因为两个规则有相同的输出。不要忘记 snakemake 尝试从规则 all 开始构建 DAG。当涉及到 运行 rule gatk 时,您将 "ref" 定义为输入。由于两个规则可以生成这个文件,snakemake 不知道它必须使用 rule gatk_refdict 还是 rule bwa_index.

你有一个错字(“_”不应该在那里):

----v
rule_gatk_refdict:
    input:
        ref="ref.fasta"
    ...