Snakemake 运行 规则多次使用先前规则的通配符输出

Snakemake run rule with wildcarded output of previous rules multiple times

我有多项研究,我必须为 n 项研究中的每一项创建两个文件(.notsad 和 .txt 文件)。创建这些文件后,我必须 运行 一个命令,每个染色体 运行s 并对给定研究中的每个染色体使用相同的两个输入文件(.notsad,.txt)。所以:

mycommand.py study1.notsad study1_filter.txt chr1.bad.gz --out chr1_filter.bad.gz
mycommand.py study1.notsad study1_filter.txt chr2.bad.gz --out chr2_filter.bad.gz
...
mycommand.py study2.notsad study2_filter.txt chr1.bad.gz --out chr1_filter.bad.gz
...

但是我无法将其发送到 运行。我收到一个错误:

WildcardError in line 33 of /scripts/Snakefile:
Wildcards in input files cannot be determined from output files:
'ds_lower'

我目前的规则:

import os
import glob

ROOT = "/rootdir/"
ORIGINAL_DATA_FOLDER="original/"
PROCESS_DATA_FOLDER="process/"

ORIGINAL_DATA_SOURCE=ROOT+ORIGINAL_DATA_FOLDER
PROCESS_DATA_SOURCE=ROOT+PROCESS_DATA_FOLDER

DATASETS = [name for name in os.listdir(ORIGINAL_DATA_SOURCE) if os.path.isdir(os.path.join(ORIGINAL_DATA_SOURCE, name))]
LOWERCASE_DATASETS = [dataset.lower() for dataset in DATASETS]
CHROMOSOME = list(range(1,23))

rule all:
    input:
        expand(PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz", ds=DATASETS, chr=CHROMOSOME)

rule run_command:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/chr{chr}.bad.gz", # Matches 22 chroms
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}_filter.txt", # But this should be common to all chr runs for this study.
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}.notsad" # This one as well.
    output:
        PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz"
    shell:
        # Run command that uses each of the previous files and runs per chromosome
        "mycommand.py {input.2} {input.1} {input.0} --out {output}"

rule write_txt_file:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/{ds_lower}_info.txt"
    output:
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}_filter.txt"
    shell:
        "touch {output}"

rule write_notsad_file:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/_{ds_lower}.sad"
    output:
        PROCESS_DATA_SOURCE+"{ds}/{ds_lower}.notsad"
    shell:
        "touch {output}"

更新 将规则 run_command 的输入更改为 lambda 函数确实有效。

rule run_command:
    input:
        ORIGINAL_DATA_SOURCE+"{ds}/chr{chr}.gen.gz",
        lambda wildcards: PROCESS_DATA_SOURCE + f"{wildcards.ds}/{wildcards.ds.lower()}_filter.txt",
        lambda wildcards: PROCESS_DATA_SOURCE + f"{wildcards.ds}/{wildcards.ds.lower()}.sample"
    output:
        PROCESS_DATA_SOURCE+"{ds}/chr{chr}_filtered.gen.gz"
    run:
        # Run command that uses each of the previous files and runs per chromosome
        "mycommand.py {input.2} {input.1} {input.0} --out {output}"

input 中使用的所有通配符都需要出现在 output 中。在规则 run_command 中,通配符 {ds_lower} 仅出现在 input 中,但不出现在 output 中。