蛇形歧义
Snakemake ambiguity
我有一个歧义错误,我不知道为什么以及如何解决它。
定义通配符:
rule all:
input:
xls = expand("reports/{sample}.xlsx", sample = config["samples"]),
runfolder_xls = expand("{runfolder}.xlsx", runfolder = config["runfolder"])
实际规则:
rule sample_report:
input:
vcf = "vcfs/{sample}.annotated.vcf",
cov = "stats/{sample}.coverage.gz",
mod_bed = "tmp/mod_ref_{sample}.bed",
nirvana_g2t = "/mnt/storage/data/NGS/nirvana_genes2transcripts"
output:
"reports/{sample}.xlsx"
params:
get_nb_samples()
log:
"logs/{sample}.log"
shell: """
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_create_sample_report.py -v {input.vcf} -c {input.cov} -r {input.mod_bed} -n {input.nirvana_g2t} -r {rule};
exitcode=$? ;
if [[ {params} > 1 ]]
then
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r {rule} -n {wildcards.sample}
elif [[ {params} == 1 ]]
then
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r sample_mode -n {wildcards.sample}
else
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e 1 -r {rule} -n {wildcards.sample}
fi
"""
rule runfolder_report:
input:
sample_sheet = "SampleSheet.csv"
output:
"{runfolder}.xlsx"
log:
"logs/{runfolder}.log"
shell: """
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_create_runfolder_report.py -run {wildcards.runfolder} -s {input.sample_sheet} -r {rule} ;
exitcode=$? ;
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r {rule} -n {wildcards.runfolder}
"""
配置文件:
runfolder: "CP0340"
samples: ['C014044p', 'C130157', 'C014040p', 'C014054b-1', 'C051198-A', 'C014042p', 'C052007W-C', 'C051198-B', 'C014038p', 'C052004-B', 'C051198-C', 'C052004-C', 'C052003-B', 'C052003-A', 'C052004-A', 'C052002-C', 'C052005-C', 'C052002-A', 'C130157N', 'C052006-B', 'C014063pW', 'C014054b-2', 'C052002-B', 'C052006-C', 'C052007W-B', 'C052003-C', 'C014064bW', 'C052005-B', 'C052006-A', 'C052005-A']
错误:
$ snakemake -n -s ../niles/Snakefile --configfile logs/CP0340_config.yaml
Building DAG of jobs...
AmbiguousRuleException:
Rules runfolder_report and sample_report are ambiguous for the file reports/C014044p.xlsx.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
runfolder_report: runfolder=reports/C014044p
sample_report: sample=C014044p
Expected input files:
runfolder_report: SampleSheet.csv
sample_report: vcfs/C014044p.annotated.vcf stats/C014044p.coverage.gz tmp/mod_ref_C014044p.bed /mnt/storage/data/NGS/nirvana_genes2transcriptsExpected output files:
runfolder_report: reports/C014044p.xlsx
sample_report: reports/C014044p.xlsx
如果我正确理解 Snakemake,规则中的通配符在我的所有规则中定义,所以我不明白为什么 runfolder_report 规则试图将 reports/C014044p.xlsx 作为输出 +输出如何具有示例名称而不是运行文件夹名称(如配置文件中所定义)。
好的,这是我的解决方案:
rule runfolder_report:
input:
"SampleSheet.csv"
output:
expand("{runfolder}.xlsx", runfolder = config["runfolder"])
params:
config["runfolder"]
log:
expand("logs/{runfolder}.log", runfolder = config["runfolder"])
shell: """
set +e ;
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_create_runfolder_report.py -run {params} -s {input} -r {rule} ;
exitcode=$? ;
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r {rule} -n {params}
"""
但是我仍然不明白为什么它会出错,但我知道它以前是有效的。
如错误消息所示,您可以为每个规则的输出分配一个不同的前缀。因此,如果您将 {runfolder}.xlsx
替换为例如规则 all
和 runfolder_report
中的 "runfolder/{runfolder}.xlsx"
,您的原始代码将起作用。或者,通过在规则 all
之前添加如下内容来限制通配符(我的首选解决方案):
wildcard_constraints:
sample= '|'.join([re.escape(x) for x in config["samples"]]),
runfolder= re.escape(config["runfolder"]),
原因是 snakemake 使用正则表达式匹配输入和输出字符串(我必须承认,它是如何完成的细节,逃避我...)
我有一个歧义错误,我不知道为什么以及如何解决它。
定义通配符:
rule all:
input:
xls = expand("reports/{sample}.xlsx", sample = config["samples"]),
runfolder_xls = expand("{runfolder}.xlsx", runfolder = config["runfolder"])
实际规则:
rule sample_report:
input:
vcf = "vcfs/{sample}.annotated.vcf",
cov = "stats/{sample}.coverage.gz",
mod_bed = "tmp/mod_ref_{sample}.bed",
nirvana_g2t = "/mnt/storage/data/NGS/nirvana_genes2transcripts"
output:
"reports/{sample}.xlsx"
params:
get_nb_samples()
log:
"logs/{sample}.log"
shell: """
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_create_sample_report.py -v {input.vcf} -c {input.cov} -r {input.mod_bed} -n {input.nirvana_g2t} -r {rule};
exitcode=$? ;
if [[ {params} > 1 ]]
then
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r {rule} -n {wildcards.sample}
elif [[ {params} == 1 ]]
then
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r sample_mode -n {wildcards.sample}
else
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e 1 -r {rule} -n {wildcards.sample}
fi
"""
rule runfolder_report:
input:
sample_sheet = "SampleSheet.csv"
output:
"{runfolder}.xlsx"
log:
"logs/{runfolder}.log"
shell: """
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_create_runfolder_report.py -run {wildcards.runfolder} -s {input.sample_sheet} -r {rule} ;
exitcode=$? ;
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r {rule} -n {wildcards.runfolder}
"""
配置文件:
runfolder: "CP0340"
samples: ['C014044p', 'C130157', 'C014040p', 'C014054b-1', 'C051198-A', 'C014042p', 'C052007W-C', 'C051198-B', 'C014038p', 'C052004-B', 'C051198-C', 'C052004-C', 'C052003-B', 'C052003-A', 'C052004-A', 'C052002-C', 'C052005-C', 'C052002-A', 'C130157N', 'C052006-B', 'C014063pW', 'C014054b-2', 'C052002-B', 'C052006-C', 'C052007W-B', 'C052003-C', 'C014064bW', 'C052005-B', 'C052006-A', 'C052005-A']
错误:
$ snakemake -n -s ../niles/Snakefile --configfile logs/CP0340_config.yaml
Building DAG of jobs...
AmbiguousRuleException:
Rules runfolder_report and sample_report are ambiguous for the file reports/C014044p.xlsx.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
runfolder_report: runfolder=reports/C014044p
sample_report: sample=C014044p
Expected input files:
runfolder_report: SampleSheet.csv
sample_report: vcfs/C014044p.annotated.vcf stats/C014044p.coverage.gz tmp/mod_ref_C014044p.bed /mnt/storage/data/NGS/nirvana_genes2transcriptsExpected output files:
runfolder_report: reports/C014044p.xlsx
sample_report: reports/C014044p.xlsx
如果我正确理解 Snakemake,规则中的通配符在我的所有规则中定义,所以我不明白为什么 runfolder_report 规则试图将 reports/C014044p.xlsx 作为输出 +输出如何具有示例名称而不是运行文件夹名称(如配置文件中所定义)。
好的,这是我的解决方案:
rule runfolder_report:
input:
"SampleSheet.csv"
output:
expand("{runfolder}.xlsx", runfolder = config["runfolder"])
params:
config["runfolder"]
log:
expand("logs/{runfolder}.log", runfolder = config["runfolder"])
shell: """
set +e ;
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_create_runfolder_report.py -run {params} -s {input} -r {rule} ;
exitcode=$? ;
python /mnt/storage/home/kimy/projects/automate_CP/niles/NILES_check_exitcode.py -e $exitcode -r {rule} -n {params}
"""
但是我仍然不明白为什么它会出错,但我知道它以前是有效的。
如错误消息所示,您可以为每个规则的输出分配一个不同的前缀。因此,如果您将 {runfolder}.xlsx
替换为例如规则 all
和 runfolder_report
中的 "runfolder/{runfolder}.xlsx"
,您的原始代码将起作用。或者,通过在规则 all
之前添加如下内容来限制通配符(我的首选解决方案):
wildcard_constraints:
sample= '|'.join([re.escape(x) for x in config["samples"]]),
runfolder= re.escape(config["runfolder"]),
原因是 snakemake 使用正则表达式匹配输入和输出字符串(我必须承认,它是如何完成的细节,逃避我...)