访问规则中的嵌套配置变量
Access nested config variables in a rule
我是 Snakemake 的新手,正在尝试找出 how/if 嵌套配置值的工作原理。我创建了以下配置文件...
# dummyconfig.json
{
"fam1": {
"numchr": 1,
"chrlen": 2500000,
"seeds": {
"genome": 8013785666,
"simtrio": 1776,
"simseq": {
"mother": 2053695854357871005,
"father": 4517457392071889495,
"proband": 2574020394472462046
}
},
"ninherited": 100,
"ndenovo": 5,
"numreads": 375000
}
}
...在我的 Snakefile 中遵守此规则(以及其他规则)。
# Snakefile
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
shell:
"nuclmm simulate --out - --order 6 --numseqs {config[wildcards.family][numchr]} --seqlen {config[wildcards.family][chrlen]} --seed {config[wildcards.family][seeds][genome]} {input} | gzip -c > {output}"
然后我想通过调用 snakemake --configfile dummyconfig.json fam1-refr.fa.gz
创建 fam1-refr.fa.gz
。当我这样做时,我收到以下错误消息。
Building DAG of jobs...
rule simgenome:
input: human.order6.mm
output: fam1-refr.fa.gz
jobid: 0
wildcards: family=fam1
RuleException in line 1 of /Users/standage/Projects/noble/Snakefile:
NameError: The name 'wildcards.family' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print }}
因此 fam1
被正确识别为 family
通配符的值,但似乎 {config[wildcards.family][numchr]}
等变量访问不起作用。
是否可以通过这种方式遍历嵌套配置,还是Snakemake只支持访问顶层变量?
解决它的一种方法是使用 params
并解析 shell
块之外的变量。
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
params:
seed=lambda w: config[w.family]['seeds']['genome'],
numseqs=lambda w: config[w.family]['numchr'],
seqlen=lambda w: config[w.family]['chrlen']
shell:
"nuclmm simulate --out - --order 6 --numseqs {params.numseqs} --seqlen {params.seqlen} --seed {params.seed} {input} | gzip -c > {output}"
我是 Snakemake 的新手,正在尝试找出 how/if 嵌套配置值的工作原理。我创建了以下配置文件...
# dummyconfig.json
{
"fam1": {
"numchr": 1,
"chrlen": 2500000,
"seeds": {
"genome": 8013785666,
"simtrio": 1776,
"simseq": {
"mother": 2053695854357871005,
"father": 4517457392071889495,
"proband": 2574020394472462046
}
},
"ninherited": 100,
"ndenovo": 5,
"numreads": 375000
}
}
...在我的 Snakefile 中遵守此规则(以及其他规则)。
# Snakefile
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
shell:
"nuclmm simulate --out - --order 6 --numseqs {config[wildcards.family][numchr]} --seqlen {config[wildcards.family][chrlen]} --seed {config[wildcards.family][seeds][genome]} {input} | gzip -c > {output}"
然后我想通过调用 snakemake --configfile dummyconfig.json fam1-refr.fa.gz
创建 fam1-refr.fa.gz
。当我这样做时,我收到以下错误消息。
Building DAG of jobs...
rule simgenome:
input: human.order6.mm
output: fam1-refr.fa.gz
jobid: 0
wildcards: family=fam1
RuleException in line 1 of /Users/standage/Projects/noble/Snakefile:
NameError: The name 'wildcards.family' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print }}
因此 fam1
被正确识别为 family
通配符的值,但似乎 {config[wildcards.family][numchr]}
等变量访问不起作用。
是否可以通过这种方式遍历嵌套配置,还是Snakemake只支持访问顶层变量?
解决它的一种方法是使用 params
并解析 shell
块之外的变量。
rule simgenome:
input:
"human.order6.mm",
output:
"{family}-refr.fa.gz"
params:
seed=lambda w: config[w.family]['seeds']['genome'],
numseqs=lambda w: config[w.family]['numchr'],
seqlen=lambda w: config[w.family]['chrlen']
shell:
"nuclmm simulate --out - --order 6 --numseqs {params.numseqs} --seqlen {params.seqlen} --seed {params.seed} {input} | gzip -c > {output}"