Snakemake --use-conda with --cluster 和 NFS4 存储
Snakemake --use-conda with --cluster and NFS4 storage
我在集群模式下使用 snakemake 向 HPCC 提交一个简单的单规则工作流程,其中 运行s Torque 具有多个计算节点。 NFSv4 存储挂载在 /data 上。有一个 link /PROJECT_DIR -> /data/PROJECT_DIR/
我使用以下方式提交作业:
snakemake --verbose --use-conda --conda-prefix /data/software/miniconda3-ngs/envs/snakemake \
--rerun-incomplete --printshellcmds --latency-wait 60 \
--configfile /PROJECT_DIR/config.yaml -s '/data/WORKFLOW_DIR/Snakefile' --jobs 100 \
--cluster-config '/PROJECT_DIR/cluster.json' \
--cluster 'qsub -j oe -l mem={cluster.mem} -l walltime={cluster.time} \
-l nodes={cluster.nodes}:ppn={cluster.ppn}'
作业失败:
Error in rule fastqc1:
jobid: 1
output: /PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html
conda-env: /data/software/miniconda3-ngs/envs/snakemake/74019bbc
shell:
fastqc -o /PROJECT_DIR/OUTPUT_DIR/ -t 4 -f fastq /PROJECT_DIR/INPUT/SAMPLE.fastq.gz
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: 211078.CLUSTER
Error executing rule fastqc1 on cluster (jobid: 1, external: 211078.CLUSTER, jobscript:
PROJECT_DIR/.snakemake/tmp.t5a2dpxe/snakejob.fastqc1.1.sh). For error details see the cluster
log and the log files of the involved rule(s).
提交的作业脚本如下所示:
Jobscript:
#!/bin/sh
# properties = {"type": "single", "rule": "fastqc1", "local": false, "input":
["/PROJECT_DIR/INPUT_DIR/SAMPLE.fastq.gz"], "output": ["/PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html"],
"wildcards": {"sample": "SAMPLE", "read": "1"},
"params": {}, "log": [], "threads": 4, "resources": {}, "jobid": 1, "cluster": {"nodes": 1,
"ppn": 4, "time": "01:00:00", "mem": "32gb"}}
cd /data/PROJECT_DIR && \
PATH='/data/software/miniconda3-ngs/envs/snakemake-5.32.2/bin':$PATH \
/data/software/miniconda3-ngs/envs/snakemake-5.32.2/bin/python3.8 \
-m snakemake /PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html --snakefile /data/WORKFLOW_DIR/Snakefile \
--force -j --keep-target-files --keep-remote --max-inventory-time 0 \
--wait-for-files /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe \
/PROJECT_DIR/INPUT/SAMPLE.fastq.gz /data/software/miniconda3-ngs/envs/snakemake/74019bbc --latency-wait 60 \
--attempt 1 --force-use-threads --scheduler ilp \
--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ \
--configfiles /PROJECT_DIR/config.yaml -p --allowed-rules fastqc1 --nocolor --notemp --no-hooks --nolock \
--mode 2 --use-conda --conda-prefix /data/software/miniconda3-ngs/envs/snakemake \
&& touch /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe/1.jobfinished || \
(touch /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe/1.jobfailed; exit 1)
不知何故,当在单个计算节点上使用交互式 qsub shell 到 运行 本地工作流时,不会出现此问题。只有在登录节点向整个计算集群提交作业时才会发生。
已测试的 snakemake 版本:
- 5.10.0
- 5.32.2
- 6.0.5
通过提供作业脚本 (--jobscript SCRIPT
) 解决:
#!/bin/bash
# properties = {properties}
set +u;
source /data/software/miniconda3-ngs/etc/profile.d/conda.sh;
conda activate snakemake-5.32.2
set -u;
{exec_job}
我在集群模式下使用 snakemake 向 HPCC 提交一个简单的单规则工作流程,其中 运行s Torque 具有多个计算节点。 NFSv4 存储挂载在 /data 上。有一个 link /PROJECT_DIR -> /data/PROJECT_DIR/
我使用以下方式提交作业:
snakemake --verbose --use-conda --conda-prefix /data/software/miniconda3-ngs/envs/snakemake \
--rerun-incomplete --printshellcmds --latency-wait 60 \
--configfile /PROJECT_DIR/config.yaml -s '/data/WORKFLOW_DIR/Snakefile' --jobs 100 \
--cluster-config '/PROJECT_DIR/cluster.json' \
--cluster 'qsub -j oe -l mem={cluster.mem} -l walltime={cluster.time} \
-l nodes={cluster.nodes}:ppn={cluster.ppn}'
作业失败:
Error in rule fastqc1:
jobid: 1
output: /PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html
conda-env: /data/software/miniconda3-ngs/envs/snakemake/74019bbc
shell:
fastqc -o /PROJECT_DIR/OUTPUT_DIR/ -t 4 -f fastq /PROJECT_DIR/INPUT/SAMPLE.fastq.gz
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: 211078.CLUSTER
Error executing rule fastqc1 on cluster (jobid: 1, external: 211078.CLUSTER, jobscript:
PROJECT_DIR/.snakemake/tmp.t5a2dpxe/snakejob.fastqc1.1.sh). For error details see the cluster
log and the log files of the involved rule(s).
提交的作业脚本如下所示:
Jobscript:
#!/bin/sh
# properties = {"type": "single", "rule": "fastqc1", "local": false, "input":
["/PROJECT_DIR/INPUT_DIR/SAMPLE.fastq.gz"], "output": ["/PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html"],
"wildcards": {"sample": "SAMPLE", "read": "1"},
"params": {}, "log": [], "threads": 4, "resources": {}, "jobid": 1, "cluster": {"nodes": 1,
"ppn": 4, "time": "01:00:00", "mem": "32gb"}}
cd /data/PROJECT_DIR && \
PATH='/data/software/miniconda3-ngs/envs/snakemake-5.32.2/bin':$PATH \
/data/software/miniconda3-ngs/envs/snakemake-5.32.2/bin/python3.8 \
-m snakemake /PROJECT_DIR/OUTPUT_DIR/SAMPLE_fastqc.html --snakefile /data/WORKFLOW_DIR/Snakefile \
--force -j --keep-target-files --keep-remote --max-inventory-time 0 \
--wait-for-files /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe \
/PROJECT_DIR/INPUT/SAMPLE.fastq.gz /data/software/miniconda3-ngs/envs/snakemake/74019bbc --latency-wait 60 \
--attempt 1 --force-use-threads --scheduler ilp \
--wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ \
--configfiles /PROJECT_DIR/config.yaml -p --allowed-rules fastqc1 --nocolor --notemp --no-hooks --nolock \
--mode 2 --use-conda --conda-prefix /data/software/miniconda3-ngs/envs/snakemake \
&& touch /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe/1.jobfinished || \
(touch /data/PROJECT_DIR/.snakemake/tmp.t5a2dpxe/1.jobfailed; exit 1)
不知何故,当在单个计算节点上使用交互式 qsub shell 到 运行 本地工作流时,不会出现此问题。只有在登录节点向整个计算集群提交作业时才会发生。
已测试的 snakemake 版本:
- 5.10.0
- 5.32.2
- 6.0.5
通过提供作业脚本 (--jobscript SCRIPT
) 解决:
#!/bin/bash
# properties = {properties}
set +u;
source /data/software/miniconda3-ngs/etc/profile.d/conda.sh;
conda activate snakemake-5.32.2
set -u;
{exec_job}