重复执行 Snakemake 规则,直到满足某些条件
Execute Snakemake rule repeatedly until certain conditions are met
我想将 Snakemake 用于需要重复执行特定步骤直到满足特定条件的流程。不可能提前确定需要多少次该步骤。它可以是 1 或 6 或任何其他数字。
我的直觉是这是 Snakemake 做不到的事情,因为定向 非循环 图和所有...
不过,我希望检查点可能会有所帮助,因为它会触发对 DAG 的重新评估,但我无法确切地理解它是如何工作的。
Snakefile 中是否可能存在循环?
谢谢!
在下面的优秀答案中添加一些关于实际发生的事情的评论。当我不可避免地重新审视这个问题时,希望它能帮助别人和我自己。
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 1.
keep_trying: output "round_1" doesn't exist. do run section. random() decides to touch output[0], which is "round_1".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 2.
keep_trying: output "round_2" doesn't exist. do run section. random() decides to touch output[0], which is "round_2".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 3.
keep_trying: output "round_3" doesn't exist. do run section. random() decides to touch "succes.txt".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" exists. return "success.txt" to rule all.
all: input requirement is "success.txt", which is now satisfied.
你说得对,为此你需要检查站!这是一个小例子,可以满足您的需求:
import os
from pathlib import Path
tries = 0
def all_input(wildcards):
global tries
if not os.path.exists("succes.txt"):
tries += 1
checkpoints.keep_trying.get(i=tries)
else:
return "succes.txt"
rule all:
input:
all_input
checkpoint keep_trying:
output:
"round_{i}"
run:
import random
if random.random() > 0.9:
Path('succes.txt').touch()
Path(output[0]).touch()
这里我们说 rule all
需要从函数 all_input
返回的内容作为输入。此函数检查文件 succes.txt
是否已经存在。如果没有,它将触发检查点的 运行 继续尝试,这可能会生成 succes.txt
文件(10% 的机会)。如果 succes.txt
确实存在,那么这就是 rule all
的输入,并且 snakemake 成功退出。
我想将 Snakemake 用于需要重复执行特定步骤直到满足特定条件的流程。不可能提前确定需要多少次该步骤。它可以是 1 或 6 或任何其他数字。
我的直觉是这是 Snakemake 做不到的事情,因为定向 非循环 图和所有...
不过,我希望检查点可能会有所帮助,因为它会触发对 DAG 的重新评估,但我无法确切地理解它是如何工作的。
Snakefile 中是否可能存在循环?
谢谢!
在下面的优秀答案中添加一些关于实际发生的事情的评论。当我不可避免地重新审视这个问题时,希望它能帮助别人和我自己。
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 1.
keep_trying: output "round_1" doesn't exist. do run section. random() decides to touch output[0], which is "round_1".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 2.
keep_trying: output "round_2" doesn't exist. do run section. random() decides to touch output[0], which is "round_2".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" doesn't exist. do checkpoint keep_trying with i == 3.
keep_trying: output "round_3" doesn't exist. do run section. random() decides to touch "succes.txt".
snakemake reevaluates graph after checkpoint is complete
all: call function all_input to determine rule's input requirements.
all_input: file "succes.txt" exists. return "success.txt" to rule all.
all: input requirement is "success.txt", which is now satisfied.
你说得对,为此你需要检查站!这是一个小例子,可以满足您的需求:
import os
from pathlib import Path
tries = 0
def all_input(wildcards):
global tries
if not os.path.exists("succes.txt"):
tries += 1
checkpoints.keep_trying.get(i=tries)
else:
return "succes.txt"
rule all:
input:
all_input
checkpoint keep_trying:
output:
"round_{i}"
run:
import random
if random.random() > 0.9:
Path('succes.txt').touch()
Path(output[0]).touch()
这里我们说 rule all
需要从函数 all_input
返回的内容作为输入。此函数检查文件 succes.txt
是否已经存在。如果没有,它将触发检查点的 运行 继续尝试,这可能会生成 succes.txt
文件(10% 的机会)。如果 succes.txt
确实存在,那么这就是 rule all
的输入,并且 snakemake 成功退出。