Snakefile 中的通配符错误

WildcardError in Snakefile

我一直在尝试 运行 以下生物信息学脚本:

configfile: "config.yaml"

WORK_TRIM = config["WORK_TRIM"]
WORK_KALL = config["WORK_KALL"]

rule all:
  input: 
    expand(WORK_KALL + "quant_result_{condition}", condition=config["conditions"])


rule kallisto_quant:
    input:
      fq1 = WORK_TRIM + "{sample}_1_trim.fastq.gz",
      fq2 = WORK_TRIM + "{sample}_2_trim.fastq.gz",
      idx = WORK_KALL + "Homo_sapiens.GRCh38.cdna.all.fa.index"
    
    output:
      WORK_KALL + "quant_result_{condition}"
    
    shell:
      "kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"

但是,我一直收到这样的错误:

WildcardError in line 13 of /home/user/directory/Snakefile:
Wildcards in input files cannot be determined from output files:
'sample'

简单解释一下,kallisto quant 将产生 3 个输出:abundance.h5abundance.tsvrun_injo.json。这些文件中的每一个都需要发送到它们自己新创建的 condition 目录中。我不明白到底出了什么问题。我将不胜感激。

如果你仔细想想,你没有给 snakemake 足够的信息。

假设“条件”是“对照”或“处理”样本“C”和“T”,分别。您需要将关联 control: C, treated: T 告诉 snakemake。您可以使用 functions-as-input 文件或 lambda 函数来执行此操作。例如:

cond2samp = {'control': 'C', 'treated': 'T'}

rule all:
  input: 
    expand("quant_result_{condition}", condition=cond2samp.keys())


rule kallisto_quant:
    input:
      fq1 = lambda wc: "%s_1_trim.fastq.gz" % cond2samp[wc.condition],
      fq2 = lambda wc: "%s_2_trim.fastq.gz" % cond2samp[wc.condition],
      idx = "Homo_sapiens.GRCh38.cdna.all.fa.index"
    output:
      "quant_result_{condition}"
    shell:
      "kallisto quant -i {input.idx} -o {output} {input.fq1} {input.fq2}"