Snakemake params 函数是否在输入文件存在之前被评估？

Question

考虑这个蛇文件：

def rdf(fn):
    f = open(fn, "rt")
    t = f.readlines()
    f.close()
    return t

rule a:
    output: "test.txt"
    input: "test.dat"
    params: X=lambda wildcards, input, output, threads, resources: rdf(input[0])
    message: "X is {params.X}"
    shell: "cp {input} {output}"

rule b:
    output: "test.dat"
    shell: "echo 'hello world' >{output}"

当运行且 test.txt 和 test.dat 都不存在时，会出现此错误：

InputFunctionException in line 7 of /Users/tedtoal/Documents/BioinformaticsConsulting/Mars/Cacao/Pipeline/SnakeMake/t2:
FileNotFoundError: [Errno 2] No such file or directory: 'test.dat'

但是，如果 test.dat 存在，则运行没问题。为什么？

我希望在 snakemake 准备好运行规则 'a' 之前不会评估参数。相反，它必须在运行ning 规则 'a' 之前的 DAG 阶段调用上面的参数函数 rdf()。然而以下工作，即使 test.dat 最初不存在：

import os

def rdf(fn):
    if not os.path.exists(fn): return ""
    f = open(fn, "rt")
    t = f.readlines()
    f.close()
    return t

rule a:
    output: "test.txt"
    input: "test.dat"
    params: X=lambda wildcards, input, output, threads, resources: rdf(input[0])
    message: "X is {params.X}"
    shell: "cp {input} {output}"

rule b:
    output: "test.dat"
    shell: "echo 'hello world' >{output}"

这意味着参数被评估两次，一次在 DAG 阶段，一次在规则执行阶段。为什么？

这对我来说是个问题。我需要能够从输入文件中读取数据到规则中，以便为要执行的程序制定参数。该命令本身不接收输入文件名，而是获取从输入文件的内容派生的参数。我可以按上面的方式处理，但这看起来很麻烦，我想知道是否有错误或我遗漏了什么？

Answer 1

我遇到了同样的问题。在我的例子中，当运行在不存在的文件上

时，我可以通过让函数 return 成为默认的占位符来规避这个问题。

例如，我有一个规则需要提前知道它的一些输入文件的行数。因此，我使用了：

def count_lines(bed):
    # This is neccessary, because in a dry-run, snakemake will evaluate the 'params' 
    # directive in the (potentiall non-existing) input files. 
    if not Path(bed).exists():
        return -1

    total = 0
    with open(bed) as f:
        for line in f:
            total += 1
    return total

rule subsample_background:
    input:        
        one = "raw/{A}/file.txt",
        two = "raw/{B}/file.txt"
    output:
        "processed/some_output.txt"
    params:
        n = lambda wildcards, input: count_lines(input.one)

    shell:
        "run.sh -n {params.n} {input.B} > {output}"

在dry-运行中，会放置一个占位符-1，让dry-运行成功“完成”，而在非dry-[=22中=]，函数会return适当的值。

Snakemake params 函数是否在输入文件存在之前被评估？

Is Snakemake params function evaluated before input file existence?

parameters

snakemake