Snakemake:缺少所有规则的输入文件

Snakemake : Missing input files for rule all

我正在开发我的第一个 Snakemake 工作流程,但由于错误而卡住了。

我想先从一条规则开始测试我的代码。我创建了 fastQC 规则。但是,当我 运行 我的 snakemake 时,我收到以下错误消息:

MissingInputException in line 24 of /ngs/prod/nanocea_project/test/Snakefile:
Missing input files for rule all:
stats/fastqc/02062021_1/02062021_1_fastqc.html
stats/fastqc/02062021_1/02062021_1_fastqc.zip
stats/fastqc/02062021_2/02062021_2_fastqc.html
stats/fastqc/25022021_2/25022021_2_fastqc.zip
stats/fastqc/25022021_2/25022021_2_fastqc.html
stats/fastqc/02062021_2/02062021_2_fastqc.zip

这是我的代码:

import glob
import os

###Global Variables###

FORMATS=["zip", "html"]
OUTDIR="/ngs/prod/nanocea_project/test/stats/fastqc"
DIR_FASTQ="/ngs/prod/nanocea_project/test/reads"

###FASTQ Files###

def list_samples(DIR_FASTQ):
        SAMPLES=[]
        for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
                base=os.path.basename(file)
                sample=(base.replace('.fastq.gz', ''))
                SAMPLES.append(sample)
        return(SAMPLES)

SAMPLES=list_samples(DIR_FASTQ)

###Rules###

rule all:
        input:
                expand("stats/fastqc/{sample}/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)

rule fastqc:
        input:
                expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
        output:
                expand(OUTDIR+"/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)
        threads:
                16
        conda:
                "envs/fastqc.yaml"
        shell:
                """
                mkdir stats/fastqc/{sample}
                fastqc {input} -o {OUTDIR}/{sample} -t {threads}
                """

这是我的文件结构:

|
|_ Snakefile
|
|_/reads
|   |
|   |_25022021_2.fastq.gz
|   |
|   |_02062021_1.fastq.gz
|   |
|   |_02062021_2.fastq.gz
|
|_/envs
|   |
|   |_fastqc.yaml
|
|_/stats
|   |
|   |_/fastqc

我在其他主题中搜索了我的问题的解决方案,但我的工作流程无法正常工作。

你有什么想法吗?

谢谢!

在 dariober 回答后进行编辑

感谢您的回答。经过多次尝试,唯一可行的解​​决方案是直接在 all 中编写代码,而 fastqc 规则是完整路径。

第一个问题:为什么我的全局变量虽然修改为符合我的 all 规则,但不起作用?

第二个问题:第一个问题解决了,运行我的程序出现了新的问题: snakemake --use-conda --cores 40

RuleException in line 28 of /ngs/prod/nanocea_project/test/Snakefile: NameError: The name 'sample' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print }}

我尝试使用双括号,但是当 mkdir 函数启动时,它会创建一个名为 {sample} 的文件夹。我不明白为什么要创建这个文件夹。

新代码:

import glob
import os

###Global Variables###

FORMATS=["zip", "html"]
DIR_FASTQ="/ngs/prod/nanocea_project/test/reads"

###FASTQ Files###

def list_samples(DIR_FASTQ):
        SAMPLES=[]
        for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
                base=os.path.basename(file)
                sample=(base.replace('.fastq.gz', ''))
                SAMPLES.append(sample)
        return(SAMPLES)

SAMPLES=list_samples(DIR_FASTQ)

###Rules###

rule all:
        input:
                expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)

rule fastqc:
        input:
                expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
        output:
                expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)
        threads:
                16
        conda:
                "envs/fastqc.yaml"
        shell:
                """
                mkdir stats/fastqc/{sample}
                fastqc {input} -o /ngs/prod/nanocea_project/test/stats/fastqc/{sample} -t {threads}
                """

在规则中你拥有:

stats/fastqc/...

但是在规则fastqc中,展开OUTDIR变量后,你有:

/ngs/prod/nanocea_project/test/stats/fastqc/...

即使它们指向同一个目录,这两个字符串也不匹配,snakemake 会报错。