Snakemake - 输入文件中的通配符无法从输出文件中确定

Snakemake - Wildcards in input files cannot be determined from output files

我是 snakemake 的新手,而且对 python 也不是很流利(所以很抱歉,这可能是一个非常基本的愚蠢问题):

我目前正在构建一个管道来分析一组带有 atlas 的 bamfile。这些 bamfile 位于不同的文件夹中,不应移动到一个公共文件夹中。因此我决定提供一个看起来像这样的样本列表(这只是一个例子,实际上样本可能在完全不同的驱动器上):

Sample     Path
Sample1    /some/path/to/my/sample/
Sample2    /some/different/path/

并将其加载到我的 config.yaml 中:

sample_file: /path/to/samplelist/samplslist.txt

现在到我的 Snakefile:

import pandas as pd

#define configfile with paths etc.
configfile: "config.yaml"

#read-in dataframe and define Sample and Path
SAMPLES = pd.read_table(config["sample_file"])
BAMFILE = SAMPLES["Sample"]
PATH = SAMPLES["Path"]

rule all:
    input:
        expand("{path}{sample}.summary.txt", zip, path=PATH, sample=BAMFILE)

#this works like a charm as long as I give the zip-function in the rules 'all' and 'summary':

rule indexBam:
    input: 
        "{path}{sample}.bam"
    output:
        "{path}{sample}.bam.bai"
    shell:
        "samtools index {input}"

#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
    input:
        bam="{path}{sample}.bam",
        bai=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE)
    params:
        prefix="analysis/BAMDiagnostics/{sample}"   
    output:
        "analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
        "analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
        "analysis/BAMDiagnostics/{sample}_MQ.txt",
        "analysis/BAMDiagnostics/{sample}_readLength.txt",
        "analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
    message:
        "running BamDiagnostics...{wildcards.sample}"
    shell:
        "{config[atlas]} task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"

rule summary:
    input:
        index=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE),
        bamd=expand("analysis/BAMDiagnostics/{sample}_approximateDepth.txt", sample=BAMFILE)
    output:
        "{path}{sample}.summary.txt"
    shell:
        "echo -e '{input.index} {input.bamd}"

我收到错误

WildcardError in line 28 of path/to/my/Snakefile: Wildcards in input files cannot be determined from output files: 'path'

谁能帮帮我?
- 我试图用 join 或创建输入函数来解决这个问题,但我认为我不够熟练,无法看到我的错误...
- 我想问题是,我的摘要规则不包含带有 {path} 的 bamdiagnostics-output 连音(因为输出在其他地方)并且无法连接到输入文件等等.. .
- 扩展我对 bamdiagnostics-rule 的输入使代码工作,但当然将每个样本输入到每个样本输出并造成大混乱: In this case, both bamfiles are used for the creation of each outputfile. This is wrong as the samples AND the output are to be treated independently.

根据 atlas 文档,您似乎需要 运行 每个样本的每个规则,这里的复杂之处在于每个样本都在单独的路径中。

我修改了您的脚本以适用于上述情况(参见 DAG)。修改了脚本开头的变量以使其更有意义。出于演示目的删除了 config,并使用了 pathlib 库(而不是 os.path.join)。 pathlib 不是必需的,但它可以帮助我保持理智。修改了 shell 命令以避免 config.

import pandas as pd
from pathlib import Path

df = pd.read_csv('sample.tsv', sep='\t', index_col='Sample')
SAMPLES = df.index
BAM_PATH = df["Path"]
# print (BAM_PATH['sample1'])

rule all:
    input:
        expand("{path}{sample}.summary.txt", zip, path=BAM_PATH, sample=SAMPLES)


rule indexBam:
    input:
        str( Path("{path}") / "{sample}.bam")
    output:
        str( Path("{path}") / "{sample}.bam.bai")
    shell:
        "samtools index {input}"

#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
    input:
        bam = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam"),
        bai = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam.bai"),
    params:
        prefix="analysis/BAMDiagnostics/{sample}"
    output:
        "analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
        "analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
        "analysis/BAMDiagnostics/{sample}_MQ.txt",
        "analysis/BAMDiagnostics/{sample}_readLength.txt",
        "analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
    message:
        "running BamDiagnostics...{wildcards.sample}"
    shell:
        ".atlas task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"

rule summary:
    input:
        bamd = "analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
        index = lambda wildcards: str( Path(BAM_PATH[wildcards.sample]) / f"{wildcards.sample}.bam.bai"),
    output:
        str( Path("{path}") / "{sample}.summary.txt")
    shell:
        "echo -e '{input.index} {input.bamd}"