运行 两个作业同时使用相同的 conda 环境时出现 snakemake 错误

snakemake error when running two jobs at once that use same conda environment

我在执行 Snakemake (6.0.0) 工作流时遇到错误,导致在同一节点上同时启动两个作业,这两个作业都使用相同的 conda 环境。最小示例如下。

几点观察:

我对 snakemake 和 HPC 都很陌生,但这似乎介于 system-/configuration-specific 问题(因为它只发生在集群上)和一个小 snakemake 错误(因为 snakemake 似乎将问题归因于我的 shell 脚本,而不是与 conda 有关)。我对如何进一步排除故障或解决问题的建议很感兴趣。

谢谢!

最小示例:

├── input.txt
├── results
└── workflow
    ├── Snakefile
    └── envs
        └── env1.yaml
$ snakemake --use-conda -j2 -p --verbose
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    4   rule1
    5

<< snip >>


[Fri Mar  5 21:01:33 2021]
Error in rule rule1:
    jobid: 2
    output: results/output2.txt
    conda-env: /long_path_to_cluster_project_folder/testing/conda_test/.snakemake/conda/c4751dca
    shell:
        
    sleep 5s
    touch results/output2.txt
    
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Full Traceback (most recent call last):
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
    run(
  File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 33, in __rule_rule1
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/shell.py", line 141, in __new__
    cmd = Conda(container_img).shellcmd(conda_env, cmd)
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
    activate = os.path.join(self.bin_path(), "activate")
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
    return os.path.join(self.prefix_path(), "bin")
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
    return self.info["conda_prefix"]
AttributeError: 'Conda' object has no attribute 'info'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
    raise ex
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
    run_func(*args)
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper
    raise RuleException(
snakemake.exceptions.RuleException: AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
  File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path

RuleException:
AttributeError in line 13 of /long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile:
'Conda' object has no attribute 'info'
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2326, in run_wrapper
  File "/long_path_to_cluster_project_folder/testing/conda_test/workflow/Snakefile", line 13, in __rule_rule1
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 512, in shellcmd
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 508, in bin_path
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 505, in prefix_path
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 568, in _callback
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/concurrent/futures/thread.py", line 52, in run
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
  File "/long_path_to_cluster_project_folder/conda_envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper

尝试至少更新到 Snakemake v6.0.2。这个问题似乎是 v6.0.0 版本的一个错误,并用 v6.0.2 版本进行了修补(Release Notes). You're right on the money with it being a race condition issue (see commit).