运行 两个作业同时使用相同的 conda 环境时出现 snakemake 错误
snakemake error when running two jobs at once that use same conda environment
我在执行 Snakemake (6.0.0) 工作流时遇到错误,导致在同一节点上同时启动两个作业,这两个作业都使用相同的 conda 环境。最小示例如下。
几点观察:
- 当 运行在我的机构集群的一个节点上而不是在我的本地机器上运行工作流时,会出现问题。 (我 运行ning
snakemake
来自 >1 cpu 的交互式 slurm 作业;我正在使用 Miniconda,由我的集群系统管理员作为 module
提供)
- 当任务被强制按顺序 运行 (
snakemake --use-conda -j1
) 时,工作流完成得很好。该问题仅在 -j2
或更高(不超过 slurm 分配中可用的内核数)时出现。第一份工作好像运行还好,总是第二份工作就嘎嘎作响
- 在 snakemake 创建后,我可以很好地激活有问题的 conda 环境(例如,在 运行 工作流之后,
conda activate /long_path_to_cluster_project_folder/testing/conda_test/.snakemake/conda/c4751dca
工作,我可以 运行 R 从等)
- 如果我 运行
snakemake --use-conda -j2
,我得到的唯一错误是 shell 命令 运行 下面的 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
。如果我添加 --verbose
,则会以蓝色和红色打印出冗长的追溯,我已将其包含在下面。相关位似乎是:
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
return self.info["conda_prefix"]
AttributeError: 'Conda' object has no attribute 'info'
- 怀疑某种竞争条件,我还尝试添加
--max-jobs-per-second=0.5
来限制作业,这样它们就不会同时启动,但这似乎没有效果(作业同时启动,和以前一样的错误。我不是 运行ning snakemake 与 --cluster
或 --profile
或任何东西;没有创建额外的 slurm 作业,只是在同一个计算节点上产生的进程)
- 如果我创建两个完全不同的 Snakemake 规则并最终同时执行,也会出现同样的问题,只要这两个规则使用相同的 conda 环境。
我对 snakemake 和 HPC 都很陌生,但这似乎介于 system-/configuration-specific 问题(因为它只发生在集群上)和一个小 snakemake 错误(因为 snakemake 似乎将问题归因于我的 shell 脚本,而不是与 conda 有关)。我对如何进一步排除故障或解决问题的建议很感兴趣。
谢谢!
最小示例:
├── input.txt
├── results
└── workflow
├── Snakefile
└── envs
└── env1.yaml
workflow/Snakefile
:
rule all:
input:
'results/output1.txt',
'results/output2.txt',
'results/output3.txt',
'results/output4.txt'
rule rule1:
input: 'input.txt'
output:
'results/output{n}.txt'
conda: 'envs/env1.yaml'
shell:"""
sleep 5s
touch {output}
"""
workflow/envs/env1.yaml
:
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- r-ggplot2
$ snakemake --use-conda -j2 -p --verbose
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 rule1
5
<< snip >>
[Fri Mar 5 21:01:33 2021]
Error in rule rule1:
jobid: 2
output: results/output2.txt
conda-env: /long_path_to_cluster_project_folder/testing/conda_test/.snakemake/conda/c4751dca
shell:
sleep 5s
touch results/output2.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Full Traceback (most recent call last):
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
run(
File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 33, in __rule_rule1
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/shell.py", line 141, in __new__
cmd = Conda(container_img).shellcmd(conda_env, cmd)
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
activate = os.path.join(self.bin_path(), "activate")
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
return os.path.join(self.prefix_path(), "bin")
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
return self.info["conda_prefix"]
AttributeError: 'Conda' object has no attribute 'info'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
raise ex
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
run_func(*args)
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper
raise RuleException(
snakemake.exceptions.RuleException: AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
RuleException:
AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper
尝试至少更新到 Snakemake v6.0.2。这个问题似乎是 v6.0.0 版本的一个错误,并用 v6.0.2 版本进行了修补(Release Notes). You're right on the money with it being a race condition issue (see commit).
我在执行 Snakemake (6.0.0) 工作流时遇到错误,导致在同一节点上同时启动两个作业,这两个作业都使用相同的 conda 环境。最小示例如下。
几点观察:
- 当 运行在我的机构集群的一个节点上而不是在我的本地机器上运行工作流时,会出现问题。 (我 运行ning
snakemake
来自 >1 cpu 的交互式 slurm 作业;我正在使用 Miniconda,由我的集群系统管理员作为module
提供) - 当任务被强制按顺序 运行 (
snakemake --use-conda -j1
) 时,工作流完成得很好。该问题仅在-j2
或更高(不超过 slurm 分配中可用的内核数)时出现。第一份工作好像运行还好,总是第二份工作就嘎嘎作响 - 在 snakemake 创建后,我可以很好地激活有问题的 conda 环境(例如,在 运行 工作流之后,
conda activate /long_path_to_cluster_project_folder/testing/conda_test/.snakemake/conda/c4751dca
工作,我可以 运行 R 从等) - 如果我 运行
snakemake --use-conda -j2
,我得到的唯一错误是 shell 命令 运行 下面的(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
。如果我添加--verbose
,则会以蓝色和红色打印出冗长的追溯,我已将其包含在下面。相关位似乎是:File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path return self.info["conda_prefix"] AttributeError: 'Conda' object has no attribute 'info'
- 怀疑某种竞争条件,我还尝试添加
--max-jobs-per-second=0.5
来限制作业,这样它们就不会同时启动,但这似乎没有效果(作业同时启动,和以前一样的错误。我不是 运行ning snakemake 与--cluster
或--profile
或任何东西;没有创建额外的 slurm 作业,只是在同一个计算节点上产生的进程) - 如果我创建两个完全不同的 Snakemake 规则并最终同时执行,也会出现同样的问题,只要这两个规则使用相同的 conda 环境。
我对 snakemake 和 HPC 都很陌生,但这似乎介于 system-/configuration-specific 问题(因为它只发生在集群上)和一个小 snakemake 错误(因为 snakemake 似乎将问题归因于我的 shell 脚本,而不是与 conda 有关)。我对如何进一步排除故障或解决问题的建议很感兴趣。
谢谢!
最小示例:
├── input.txt
├── results
└── workflow
├── Snakefile
└── envs
└── env1.yaml
workflow/Snakefile
:rule all: input: 'results/output1.txt', 'results/output2.txt', 'results/output3.txt', 'results/output4.txt' rule rule1: input: 'input.txt' output: 'results/output{n}.txt' conda: 'envs/env1.yaml' shell:""" sleep 5s touch {output} """
workflow/envs/env1.yaml
:channels: - conda-forge - bioconda - defaults dependencies: - r-ggplot2
$ snakemake --use-conda -j2 -p --verbose
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 rule1
5
<< snip >>
[Fri Mar 5 21:01:33 2021]
Error in rule rule1:
jobid: 2
output: results/output2.txt
conda-env: /long_path_to_cluster_project_folder/testing/conda_test/.snakemake/conda/c4751dca
shell:
sleep 5s
touch results/output2.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Full Traceback (most recent call last):
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
run(
File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 33, in __rule_rule1
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/shell.py", line 141, in __new__
cmd = Conda(container_img).shellcmd(conda_env, cmd)
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
activate = os.path.join(self.bin_path(), "activate")
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
return os.path.join(self.prefix_path(), "bin")
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
return self.info["conda_prefix"]
AttributeError: 'Conda' object has no attribute 'info'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
raise ex
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
run_func(*args)
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper
raise RuleException(
snakemake.exceptions.RuleException: AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
RuleException:
AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper
尝试至少更新到 Snakemake v6.0.2。这个问题似乎是 v6.0.0 版本的一个错误,并用 v6.0.2 版本进行了修补(Release Notes). You're right on the money with it being a race condition issue (see commit).