在 Snakemake conda 环境中安装非 conda 依赖项的最佳方法

Best way for installing non-conda dependencies in Snakemake conda environments

我希望能够在 Snakemake 创建的 R conda 环境中安装来自 GitHub 的 R 包,以及通过 pip 在 [=78] 中安装 python 库=] 环境。之后我会在一整套规则中使用这些环境。

我最初的想法是创建一个规则运行ning 一个脚本来安装指定的包。

例如,我的初始 运行 是:snakemake -j1 --use-conda -R create_r_environment

我的 Snakefile:

rule create_r_environment:
    conda:
        "envs/r.yaml"
    script:
        "scripts/r-dependencies.R"

rule create_python_environment:
    conda:
        "envs/python.yaml"
    script:
        "scripts/python-dependencies.py"    

我的envs/r.yaml文件:

channels:
 - conda-forge
dependencies:
 - r=4.0

我的r-dependencies.R文件:

remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

我的envs/pyton.yaml文件:

channels:
 - conda-forge
dependencies:
 - python=3.8.2

我的python-dependencies.py文件:

!pip install gseapy

日志输出:

Building DAG of jobs...
Creating conda environment envs/r.yaml...
Downloading and installing remote packages.
Environment for envs/r.yaml created (location: .snakemake/conda/388,repos = "http://cran.us.r-project.org")f7df8)
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   create_r_environment
    1

[Fri Oct 30 22:38:56 2020]
rule create_r_environment:
    jobid: 0

Activating conda environment: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
[Fri Oct 30 22:38:57 2020]
Error in rule create_r_environment:
    jobid: 0
    conda-env: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8

RuleException:
CalledProcessError in line 5 of /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile:
Command 'source /home/cmcouto-silva/miniconda3/bin/activate '/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8'; set -euo pipefail;  Rscript --vanilla /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/scripts/tmpa6jdxovx.r-dependencies.R' returned non-zero exit status 1.
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
  File "/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile", line 5, in __rule_create_r_environment
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/log/2020-10-30T223743.852983.snakemake.log

我的文件夹结构:

.
├── envs
│   ├── python.yaml
│   └── r.yaml
├── scripts
│   ├── python-dependencies.py
│   └── r-dependencies.R
└── Snakefile

环境创建成功,但是运行脚本创建失败,不知道为什么。我已将 envs/r.yaml 文件内容更改为 install.packages("data.table") 以查看 github 包是否存在问题,但事实并非如此。无论如何它都失败了。当我 运行 规则 create_python_environment 时也会发生同样的情况(此处未显示输出)。

有什么帮助吗?


在接受答案后编辑

正如@dariober 指出的那样,我在脚本中调用它之前忘记安装 remotes 包。我在 .yaml 文件中完成了它,并且效果很好。另外,我使用 shell 而不是 python 文件安装了 pip 库。

我想强调一些要点,以防万一有人遇到相同或类似的问题:

首先,我可以成功安装我需要的更多包,但其中一些需要特定的库(例如 libcurl),这些库安装在我的系统中,但在 Snakemake conda 环境中无法识别,迫使我要么在Snakemake conda环境中安装(这对重现性有好处,虽然我还不知道怎么做)或者指定路径库。也许更好的选择是使用容器,就像@merv 注释掉的那样。

其次,我发现Snakemake已经提供了一种使用.yaml文件安装pip库的方法。从 documentation 来看,它看起来像这样:

name: stats2
channels:
  - javascript
dependencies:
  - python=3.6   # or 2.7
  - bokeh=0.9.2
  - numpy=1.9.*
  - nodejs=0.10.*
  - flask
  - pip:
    - Flask-Testing

我觉得错误的地方还挺多的:

  • remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never"):在您的 r.yaml 中,您应该包含 remotes 包。

  • !pip install gseapy 是无效的 python 代码。如果有的话,它是由 shell 执行的代码,但我不确定前导 ! 是否正确。此外,gseapy 可从 bioconda 获得我不明白为什么你应该用 pip 安装它。


OP 编辑​​问题之前

My envs/r.yaml file:

remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

奇怪的是您正确创建了 conda 环境,因为 r.yaml 不是有效的环境文件。

这就是我试图重现您的问题的方法:

r.yaml

 cat r.yaml  
 remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")

蛇文件:

cat Snakefile 
rule create_r_environment:
    conda:
        "r.yaml"
    script:
        "r-dependencies.R"

执行:

snakemake -j1 --use-conda -R create_r_environment

Building DAG of jobs...
Creating conda environment r.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/dario/Downloads/r.yaml:

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main.py", line 80, in do_call
        exit_code = getattr(module, func_name)(args, parser)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main_create.py", line 80, in execute
        directory=os.getcwd())
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/__init__.py", line 40, in detect
        if spec.can_handle():
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
        self._environment = env.from_file(self.filename)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 151, in from_file
        return from_yaml(yamlstr, filename=filename)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 137, in from_yaml
        data = validate_keys(data, kwargs)
      File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 35, in validate_keys
        new_data = data.copy() if data else {}
    AttributeError: 'str' object has no attribute 'copy'

`$ /home/dario/miniconda3/bin/conda-env create --file /home/dario/Downloads/.snakemake/conda/095b0ca2.yaml --prefix /home/dario/Downloads/.snakemake/conda/095b0ca2`

  environment variables:
                 CIO_TEST=<not set>
        CMAKE_PREFIX_PATH=/home/dario/miniconda3/envs/tritume:/home/dario/miniconda3/envs/tritum
                          e/x86_64-conda-linux-gnu/sysroot/usr
  CONDA_AUTO_UPDATE_CONDA=false
      CONDA_BUILD_SYSROOT=/home/dario/miniconda3/envs/tritume/x86_64-conda-linux-gnu/sysroot
        CONDA_DEFAULT_ENV=tritume
                CONDA_EXE=/home/dario/miniconda3/bin/conda
             CONDA_PREFIX=/home/dario/miniconda3/envs/tritume
    CONDA_PROMPT_MODIFIER=(tritume)
         CONDA_PYTHON_EXE=/home/dario/miniconda3/bin/python
               CONDA_ROOT=/home/dario/miniconda3
              CONDA_SHLVL=1
            DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
           MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
                     PATH=/home/dario/miniconda3/envs/tritume/bin:/home/dario/miniconda3/condabi
                          n:/opt/gradle/gradle-5.2/bin:/home/dario/.local/share/umake/bin:/home/
                          dario/.local/bin:/home/dario/bin:/opt/gradle/gradle-5.2/bin:/usr/local
                          /sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/loc
                          al/games:/snap/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-1
                          0-oracle/db/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
               WINDOWPATH=2

     active environment : tritume
    active env location : /home/dario/miniconda3/envs/tritume
            shell level : 1
       user config file : /home/dario/.condarc
 populated config files : /home/dario/.condarc
          conda version : 4.8.3
    conda-build version : not installed
         python version : 3.7.6.final.0
       virtual packages : __glibc=2.27
       base environment : /home/dario/miniconda3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/dario/miniconda3/pkgs
                          /home/dario/.conda/pkgs
       envs directories : /home/dario/miniconda3/envs
                          /home/dario/.conda/envs
               platform : linux-64
             user-agent : conda/4.8.3 requests/2.22.0 CPython/3.7.6 Linux/4.15.0-91-generic ubuntu/18.04.4 glibc/2.27
                UID:GID : 1001:1001
             netrc file : None
           offline mode : False


An unexpected error has occurred. Conda has prepared the above report.

If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers?

[y/N]: 
Timeout reached. No report sent.


  File "/home/dario/miniconda3/envs/tritume/lib/python3.6/site-packages/snakemake/deployment/conda.py", line 320, in create

无论如何,你的错误是:

... r-dependencies.R' returned non-zero exit status 1

你在 r-dependencies.R 里有什么?