用于分析的 Snakemake 规则，其中为 diff 参数生成单个结果文件，并且参数来自另一个规则输出内容

Question

我有以下基本的 snakemake 设置：

rule step1:
    """
    The output will contain a list of GENEs in a txt file.
    """

    input: "input1.txt"

    output: "output1.txt"

    shell:
        """
        analysis1.R {input} {output}
        """

rule step2:
    """
    Analysis step2.
    """

    input: "input2.txt"

    output: "output2.txt"

    shell:
        """
        analysis2.py {input} {output}
        """

rule step3:
    """
    GENE should be coming from the step1 output file, with a GENE name on each
    line.
    """

    input: rules.step2.output

    output: "output3-GENE.txt"

    shell:
        """
        analysis3.py -i {input} -o {output} -p GENE
        """

我在步骤 1 中为步骤 3 生成了一个包含基因（参数）列表的文件，在步骤 2 中生成了另一个文件。我想做的是运行 step3 与我在 output1.txt 中的行一样多，其中行的内容是 step3 的参数，它也应该是输出文件名的一部分，但我无法理解它。有任何想法吗？感谢您的帮助！

Answer 1

您可以使用 checkpoints.

如果您知道 step3 文件应生成的文件列表，您可以定义一个 aggregate 规则：

rule aggregate:
    input:
        # List of files that step3 needs to generate

这样您就可以根据需要运行 rule step3 多次。

棘手的部分是定义这些文件的列表。那应该是 rule step1:

结果的函数

def aggregate_input(wildcards):
    with checkpoints.rile1.get().output[0].open() as f:
        return f.readlines()

rule aggregate:
    input:
        aggregate_input

在这种情况下，rule step1 应成为检查点：

checkpoint step1:
    """
    The output will contain a list of GENEs in a txt file.
    """

    input: "input1.txt"

    output: "output1.txt"

    shell:
        """
        analysis1.R {input} {output}
        """

在我的示例中，我将函数 aggregate_input 简化为 return，只是 step1 的输出行。如果您需要更复杂的功能，您可以自己设计一个。

用于分析的 Snakemake 规则，其中为 diff 参数生成单个结果文件，并且参数来自另一个规则输出内容

Snakemake rule for analysis where a single result file is produced for diff parameters, and parameter is coming from another rule output content

python

snakemake