Snakemake 在循环中使用规则

Question

我正在尝试在循环中使用 Snakemake 规则，以便该规则将前一次迭代的输出作为输入。那可能吗？如果可以，我该怎么做？

这是我的例子

设置测试数据

mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt

小蛇

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)


#### LOOP ####
for i in list(range(1, 4)):
    # Setup prefix for input
    if i == 1:
        prefix = "test"
    else:
        prefix = "loop%s" % str(i-1)

    # Setup prefix for output
    opref =  "loop%s" % str(i)

    # Rule
    rule loop_rule:
        input:
            prefix+"/{sample}.txt"
        output:
            prefix+"/{sample}.txt"
            #expand("loop{i}/{sample}.txt", i = i, sample = wildcards.sample)
        params:
            add=prefix
        shell:
            "awk '{{print [=11=], {params.add}}}' {input} > {output}"

尝试运行该示例会产生错误 CreateRuleException in line 26 of /Users/fabiangrammes/Desktop/Projects/snake_loop/Snakefile: The name loop_rule is already used by another rule。如果有人发现一个选项可以让那个东西工作，那就太好了！

谢谢！

Answer 1

我的理解是你的规则在运行之前被转换为 python 代码，并且你的 Snakefile 中存在的所有原始 python 代码都是运行顺序在这个过程中。将其视为您的 snakemake 规则被评估为 python 函数。

但是有一个限制，即任何规则只能对一个函数求值一次。

您可以有 if/else 个表达式并根据配置值等对规则进行差异评估（一次），但不能多次评估规则。

我不太确定如何重写你的 Snakefile 来实现你想要的。是否有一个真实的例子可以给出似乎需要循环结构的地方？

--- 编辑

对于固定的迭代次数，可以多次使用 input-function 到运行规则。（虽然我会警告不要这样做，要非常小心禁止无限循环）

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)

def looper_input(wildcards):
    # could be written more cleanly with a dictionary
    if (wildcards["prefix"] == "loop0"):
        input = "test/{}.txt".format(wildcards["sample"])
    else if (wildcards["prefix"] == "loop1"):
        input = "loop0/{}.txt".format(wildcards["sample"])
    ...
    return input


rule looper:
    input:
            looper_input
    output:
            "{prefix}/{sample}.txt"
    params:
            # ? should this be add="{prefix}" ?
            add=prefix
    shell:
            "awk '{{print [=10=], {params.add}}}' {input} > {output}"

Answer 2

我认为这是使用递归编程的好机会。与其明确地为每次迭代包含条件，不如编写一条从迭代 (n-1) 过渡到 n 的规则。所以，沿着这些方向：

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        expand("loop3/{sample}.txt", sample=SAMPLES)

def recurse_sample(wcs):
    n = int(wcs.n)
    if n == 1:
        return "test/%s.txt" % wcs.sample
    elif n > 1:
        return "loop%d/%s.txt" % (n-1, wcs.sample)
    else:
        raise ValueError("loop numbers must be 1 or greater: received %s" % wcs.n)

rule loop_n:
    input: recurse_sample
    output: "loop{n}/{sample}.txt"
    wildcard_constraints:
        sample="[^/]+",
        n="[0-9]+"
    shell:
        """
        awk -v loop='loop{wildcards.n}' '{{print [=10=], loop}}' {input} > {output}
        """

正如@RussHyde 所说，您需要主动确保不会触发无限循环。为此，我们确保所有情况都包含在 recurse_sample 中，并使用 wildcard_constraints 确保匹配精确。

Snakemake 在循环中使用规则

Snakemake using a rule in a loop

python

shell

snakemake