作业完成后 snakemake 缺少输出异常
snakemake missing output exception after job completed
我正在通过 snakemake 运行ning DASTool,出于某种原因,尽管我得到了输出箱,但出现了以下错误。虽然这是一个小烦恼,因为我有输出,但它会立即杀死我的 snakemake 运行。 snakefile 看起来像这样:
rule DAS_Tool:
input:
da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
da3="{datadir}/{sample}.fna",
db=config["dastool_database"]
threads:config["threads"]
conda:"binning.yml"
output:
daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}")
shell:
"""
date
DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {output.daout} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
2> >(tee {log}.stderr) > >(tee {log}.stdout)
touch das_tool.done
date
错误如下:
Waiting at most 120 seconds for missing files.
MissingOutputException in line 277 of /mnt/lscratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/metaspades_binning_snakefile:
Job completed successfully, but some output files are missing. Missing files after 120 seconds:
/scratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/bwa_sr_metaspades/dastool_output/metaspades
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
它正在终止作业可能缺少哪些其他文件?我已经尝试了最多 900 秒的 --latency-wait 选项,但还没有成功。
谢谢!
编辑:根据 Gajapathy 的评论,我将文件编辑为如下所示:
rule DAS_Tool:
input:
da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
da3="{datadir}/{sample}.fna",
db=config["dastool_database"]
threads:config["threads"]
conda:"/home/users/sbusi/apps/environments/base.yml"
params:
basename="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}"
output:
daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_DASTool_bins"),
dafile="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_proteins.faa",
damfile=touch("{datadir}/{mapper}_{reads}_{sample}_das_tool.done")
shell:
"""
date
DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {params.basename} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
2> >(tee {log}.stderr) > >(tee {log}.stdout)
touch {output.damfile}
date
"""
有效!!谢谢@Gajapathy!
根据DAS_Tool's doc,-o
定义输出文件的basename;不是输出文件夹。
-o, --outputbasename Basename of output files.
所以通用的简化规则看起来像
rule DAS_Tool:
output: 'path/to/outdir/basename_proteins.faa`
params: basename = 'path/to/outdir/basename'
shell: "DAS_Tool .... -o {params.basename} ...."
如果您不想在 params
中硬编码基本名称,您可以使用 python 的 lambda 魔法从参数中的输出文件中获取它。
我正在通过 snakemake 运行ning DASTool,出于某种原因,尽管我得到了输出箱,但出现了以下错误。虽然这是一个小烦恼,因为我有输出,但它会立即杀死我的 snakemake 运行。 snakefile 看起来像这样:
rule DAS_Tool:
input:
da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
da3="{datadir}/{sample}.fna",
db=config["dastool_database"]
threads:config["threads"]
conda:"binning.yml"
output:
daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}")
shell:
"""
date
DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {output.daout} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
2> >(tee {log}.stderr) > >(tee {log}.stdout)
touch das_tool.done
date
错误如下:
Waiting at most 120 seconds for missing files.
MissingOutputException in line 277 of /mnt/lscratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/metaspades_binning_snakefile:
Job completed successfully, but some output files are missing. Missing files after 120 seconds:
/scratch/users/sbusi/ONT/cedric_ont_basecalling/Binning/bwa_sr_metaspades/dastool_output/metaspades
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
它正在终止作业可能缺少哪些其他文件?我已经尝试了最多 900 秒的 --latency-wait 选项,但还没有成功。
谢谢!
编辑:根据 Gajapathy 的评论,我将文件编辑为如下所示:
rule DAS_Tool:
input:
da1="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_metabat.scaffolds2bin.tsv",
da2="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_maxbin.scaffolds2bin.tsv",
da3="{datadir}/{sample}.fna",
db=config["dastool_database"]
threads:config["threads"]
conda:"/home/users/sbusi/apps/environments/base.yml"
params:
basename="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}"
output:
daout=directory("{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_DASTool_bins"),
dafile="{datadir}/{mapper}_{reads}_{sample}/dastool_output/{sample}_proteins.faa",
damfile=touch("{datadir}/{mapper}_{reads}_{sample}_das_tool.done")
shell:
"""
date
DAS_Tool -i {input.da1},{input.da2} -c {input.da3} -o {params.basename} --search_engine diamond -l maxbin2,metabat2 --write_bins 1 --write_bin_evals 1 --threads {threads} --db_directory {input.db} --create_plots 1 &&\
2> >(tee {log}.stderr) > >(tee {log}.stdout)
touch {output.damfile}
date
"""
有效!!谢谢@Gajapathy!
根据DAS_Tool's doc,-o
定义输出文件的basename;不是输出文件夹。
-o, --outputbasename Basename of output files.
所以通用的简化规则看起来像
rule DAS_Tool:
output: 'path/to/outdir/basename_proteins.faa`
params: basename = 'path/to/outdir/basename'
shell: "DAS_Tool .... -o {params.basename} ...."
如果您不想在 params
中硬编码基本名称,您可以使用 python 的 lambda 魔法从参数中的输出文件中获取它。