重复执行 Snakemake 规则,直到满足某些条件

Execute Snakemake rule repeatedly until certain conditions are met

我想将 Snakemake 用于需要重复执行特定步骤直到满足特定条件的流程。不可能提前确定需要多少次该步骤。它可以是 1 或 6 或任何其他数字。

我的直觉是这是 Snakemake 做不到的事情,因为定向 非循环 图和所有...

不过,我希望检查点可能会有所帮助,因为它会触发对 DAG 的重新评估,但我无法确切地理解它是如何工作的。

Snakefile 中是否可能存在循环?

谢谢!


在下面的优秀答案中添加一些关于实际发生的事情的评论。当我不可避免地重新审视这个问题时,希望它能帮助别人和我自己。

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" doesn't exist.  do checkpoint keep_trying with i == 1.     
keep_trying:  output "round_1" doesn't exist.  do run section.  random() decides to touch output[0], which is "round_1".

snakemake reevaluates graph after checkpoint is complete

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" doesn't exist.  do checkpoint keep_trying with i == 2.
keep_trying:   output "round_2" doesn't exist.  do run section.  random() decides to touch output[0], which is "round_2".

snakemake reevaluates graph after checkpoint is complete

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" doesn't exist.  do checkpoint keep_trying with i == 3.
keep_trying:  output "round_3" doesn't exist.  do run section.  random() decides to touch "succes.txt".

snakemake reevaluates graph after checkpoint is complete

all:  call function all_input to determine rule's input requirements.
all_input:  file "succes.txt" exists.  return "success.txt" to rule all.
all:  input requirement is "success.txt", which is now satisfied.

你说得对,为此你需要检查站!这是一个小例子,可以满足您的需求:

import os
from pathlib import Path


tries = 0
def all_input(wildcards):
    global tries
    if not os.path.exists("succes.txt"):
        tries += 1
        checkpoints.keep_trying.get(i=tries)
    else:
        return "succes.txt"


rule all:
    input:
        all_input


checkpoint keep_trying:
    output:
        "round_{i}"
    run:
        import random
        if random.random() > 0.9:
            Path('succes.txt').touch()
        Path(output[0]).touch()

这里我们说 rule all 需要从函数 all_input 返回的内容作为输入。此函数检查文件 succes.txt 是否已经存在。如果没有,它将触发检查点的 运行 继续尝试,这可能会生成 succes.txt 文件(10% 的机会)。如果 succes.txt 确实存在,那么这就是 rule all 的输入,并且 snakemake 成功退出。