在 Snakemake conda 环境中安装非 conda 依赖项的最佳方法
Best way for installing non-conda dependencies in Snakemake conda environments
我希望能够在 Snakemake 创建的 R conda 环境中安装来自 GitHub 的 R 包,以及通过 pip
在 [=78] 中安装 python 库=] 环境。之后我会在一整套规则中使用这些环境。
我最初的想法是创建一个规则运行ning 一个脚本来安装指定的包。
例如,我的初始 运行 是:snakemake -j1 --use-conda -R create_r_environment
。
我的 Snakefile:
rule create_r_environment:
conda:
"envs/r.yaml"
script:
"scripts/r-dependencies.R"
rule create_python_environment:
conda:
"envs/python.yaml"
script:
"scripts/python-dependencies.py"
我的envs/r.yaml文件:
channels:
- conda-forge
dependencies:
- r=4.0
我的r-dependencies.R文件:
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
我的envs/pyton.yaml文件:
channels:
- conda-forge
dependencies:
- python=3.8.2
我的python-dependencies.py文件:
!pip install gseapy
日志输出:
Building DAG of jobs...
Creating conda environment envs/r.yaml...
Downloading and installing remote packages.
Environment for envs/r.yaml created (location: .snakemake/conda/388,repos = "http://cran.us.r-project.org")f7df8)
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 create_r_environment
1
[Fri Oct 30 22:38:56 2020]
rule create_r_environment:
jobid: 0
Activating conda environment: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
[Fri Oct 30 22:38:57 2020]
Error in rule create_r_environment:
jobid: 0
conda-env: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
RuleException:
CalledProcessError in line 5 of /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile:
Command 'source /home/cmcouto-silva/miniconda3/bin/activate '/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8'; set -euo pipefail; Rscript --vanilla /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/scripts/tmpa6jdxovx.r-dependencies.R' returned non-zero exit status 1.
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
File "/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile", line 5, in __rule_create_r_environment
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/log/2020-10-30T223743.852983.snakemake.log
我的文件夹结构:
.
├── envs
│ ├── python.yaml
│ └── r.yaml
├── scripts
│ ├── python-dependencies.py
│ └── r-dependencies.R
└── Snakefile
环境创建成功,但是运行脚本创建失败,不知道为什么。我已将 envs/r.yaml
文件内容更改为 install.packages("data.table")
以查看 github 包是否存在问题,但事实并非如此。无论如何它都失败了。当我 运行 规则 create_python_environment
时也会发生同样的情况(此处未显示输出)。
有什么帮助吗?
在接受答案后编辑
正如@dariober 指出的那样,我在脚本中调用它之前忘记安装 remotes
包。我在 .yaml 文件中完成了它,并且效果很好。另外,我使用 shell 而不是 python 文件安装了 pip 库。
我想强调一些要点,以防万一有人遇到相同或类似的问题:
首先,我可以成功安装我需要的更多包,但其中一些需要特定的库(例如 libcurl),这些库安装在我的系统中,但在 Snakemake conda 环境中无法识别,迫使我要么在Snakemake conda环境中安装(这对重现性有好处,虽然我还不知道怎么做)或者指定路径库。也许更好的选择是使用容器,就像@merv 注释掉的那样。
其次,我发现Snakemake已经提供了一种使用.yaml文件安装pip库的方法。从 documentation 来看,它看起来像这样:
name: stats2
channels:
- javascript
dependencies:
- python=3.6 # or 2.7
- bokeh=0.9.2
- numpy=1.9.*
- nodejs=0.10.*
- flask
- pip:
- Flask-Testing
我觉得错误的地方还挺多的:
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
:在您的 r.yaml 中,您应该包含 remotes
包。
!pip install gseapy
是无效的 python 代码。如果有的话,它是由 shell 执行的代码,但我不确定前导 !
是否正确。此外,gseapy
可从 bioconda 获得我不明白为什么你应该用 pip 安装它。
OP 编辑问题之前
My envs/r.yaml file:
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
奇怪的是您正确创建了 conda 环境,因为 r.yaml
不是有效的环境文件。
这就是我试图重现您的问题的方法:
r.yaml
cat r.yaml
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
蛇文件:
cat Snakefile
rule create_r_environment:
conda:
"r.yaml"
script:
"r-dependencies.R"
执行:
snakemake -j1 --use-conda -R create_r_environment
Building DAG of jobs...
Creating conda environment r.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/dario/Downloads/r.yaml:
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
return func(*args, **kwargs)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main.py", line 80, in do_call
exit_code = getattr(module, func_name)(args, parser)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main_create.py", line 80, in execute
directory=os.getcwd())
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/__init__.py", line 40, in detect
if spec.can_handle():
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
self._environment = env.from_file(self.filename)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 151, in from_file
return from_yaml(yamlstr, filename=filename)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 137, in from_yaml
data = validate_keys(data, kwargs)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 35, in validate_keys
new_data = data.copy() if data else {}
AttributeError: 'str' object has no attribute 'copy'
`$ /home/dario/miniconda3/bin/conda-env create --file /home/dario/Downloads/.snakemake/conda/095b0ca2.yaml --prefix /home/dario/Downloads/.snakemake/conda/095b0ca2`
environment variables:
CIO_TEST=<not set>
CMAKE_PREFIX_PATH=/home/dario/miniconda3/envs/tritume:/home/dario/miniconda3/envs/tritum
e/x86_64-conda-linux-gnu/sysroot/usr
CONDA_AUTO_UPDATE_CONDA=false
CONDA_BUILD_SYSROOT=/home/dario/miniconda3/envs/tritume/x86_64-conda-linux-gnu/sysroot
CONDA_DEFAULT_ENV=tritume
CONDA_EXE=/home/dario/miniconda3/bin/conda
CONDA_PREFIX=/home/dario/miniconda3/envs/tritume
CONDA_PROMPT_MODIFIER=(tritume)
CONDA_PYTHON_EXE=/home/dario/miniconda3/bin/python
CONDA_ROOT=/home/dario/miniconda3
CONDA_SHLVL=1
DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
PATH=/home/dario/miniconda3/envs/tritume/bin:/home/dario/miniconda3/condabi
n:/opt/gradle/gradle-5.2/bin:/home/dario/.local/share/umake/bin:/home/
dario/.local/bin:/home/dario/bin:/opt/gradle/gradle-5.2/bin:/usr/local
/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/loc
al/games:/snap/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-1
0-oracle/db/bin
REQUESTS_CA_BUNDLE=<not set>
SSL_CERT_FILE=<not set>
WINDOWPATH=2
active environment : tritume
active env location : /home/dario/miniconda3/envs/tritume
shell level : 1
user config file : /home/dario/.condarc
populated config files : /home/dario/.condarc
conda version : 4.8.3
conda-build version : not installed
python version : 3.7.6.final.0
virtual packages : __glibc=2.27
base environment : /home/dario/miniconda3 (writable)
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/bioconda/linux-64
https://conda.anaconda.org/bioconda/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/dario/miniconda3/pkgs
/home/dario/.conda/pkgs
envs directories : /home/dario/miniconda3/envs
/home/dario/.conda/envs
platform : linux-64
user-agent : conda/4.8.3 requests/2.22.0 CPython/3.7.6 Linux/4.15.0-91-generic ubuntu/18.04.4 glibc/2.27
UID:GID : 1001:1001
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers?
[y/N]:
Timeout reached. No report sent.
File "/home/dario/miniconda3/envs/tritume/lib/python3.6/site-packages/snakemake/deployment/conda.py", line 320, in create
无论如何,你的错误是:
... r-dependencies.R' returned non-zero exit status 1
你在 r-dependencies.R
里有什么?
我希望能够在 Snakemake 创建的 R conda 环境中安装来自 GitHub 的 R 包,以及通过 pip
在 [=78] 中安装 python 库=] 环境。之后我会在一整套规则中使用这些环境。
我最初的想法是创建一个规则运行ning 一个脚本来安装指定的包。
例如,我的初始 运行 是:snakemake -j1 --use-conda -R create_r_environment
。
我的 Snakefile:
rule create_r_environment:
conda:
"envs/r.yaml"
script:
"scripts/r-dependencies.R"
rule create_python_environment:
conda:
"envs/python.yaml"
script:
"scripts/python-dependencies.py"
我的envs/r.yaml文件:
channels:
- conda-forge
dependencies:
- r=4.0
我的r-dependencies.R文件:
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
我的envs/pyton.yaml文件:
channels:
- conda-forge
dependencies:
- python=3.8.2
我的python-dependencies.py文件:
!pip install gseapy
日志输出:
Building DAG of jobs...
Creating conda environment envs/r.yaml...
Downloading and installing remote packages.
Environment for envs/r.yaml created (location: .snakemake/conda/388,repos = "http://cran.us.r-project.org")f7df8)
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 create_r_environment
1
[Fri Oct 30 22:38:56 2020]
rule create_r_environment:
jobid: 0
Activating conda environment: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
[Fri Oct 30 22:38:57 2020]
Error in rule create_r_environment:
jobid: 0
conda-env: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8
RuleException:
CalledProcessError in line 5 of /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile:
Command 'source /home/cmcouto-silva/miniconda3/bin/activate '/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/conda/388f7df8'; set -euo pipefail; Rscript --vanilla /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/scripts/tmpa6jdxovx.r-dependencies.R' returned non-zero exit status 1.
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper
File "/home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/Snakefile", line 5, in __rule_create_r_environment
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/cmcouto-silva/miniconda3/envs/snakemake/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/cmcouto-silva/cmcouto.silva@usp.br/lab_files/phd_data/SO/.snakemake/log/2020-10-30T223743.852983.snakemake.log
我的文件夹结构:
.
├── envs
│ ├── python.yaml
│ └── r.yaml
├── scripts
│ ├── python-dependencies.py
│ └── r-dependencies.R
└── Snakefile
环境创建成功,但是运行脚本创建失败,不知道为什么。我已将 envs/r.yaml
文件内容更改为 install.packages("data.table")
以查看 github 包是否存在问题,但事实并非如此。无论如何它都失败了。当我 运行 规则 create_python_environment
时也会发生同样的情况(此处未显示输出)。
有什么帮助吗?
在接受答案后编辑
正如@dariober 指出的那样,我在脚本中调用它之前忘记安装 remotes
包。我在 .yaml 文件中完成了它,并且效果很好。另外,我使用 shell 而不是 python 文件安装了 pip 库。
我想强调一些要点,以防万一有人遇到相同或类似的问题:
首先,我可以成功安装我需要的更多包,但其中一些需要特定的库(例如 libcurl),这些库安装在我的系统中,但在 Snakemake conda 环境中无法识别,迫使我要么在Snakemake conda环境中安装(这对重现性有好处,虽然我还不知道怎么做)或者指定路径库。也许更好的选择是使用容器,就像@merv 注释掉的那样。
其次,我发现Snakemake已经提供了一种使用.yaml文件安装pip库的方法。从 documentation 来看,它看起来像这样:
name: stats2
channels:
- javascript
dependencies:
- python=3.6 # or 2.7
- bokeh=0.9.2
- numpy=1.9.*
- nodejs=0.10.*
- flask
- pip:
- Flask-Testing
我觉得错误的地方还挺多的:
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
:在您的 r.yaml 中,您应该包含remotes
包。!pip install gseapy
是无效的 python 代码。如果有的话,它是由 shell 执行的代码,但我不确定前导!
是否正确。此外,gseapy
可从 bioconda 获得我不明白为什么你应该用 pip 安装它。
OP 编辑问题之前
My envs/r.yaml file:
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
奇怪的是您正确创建了 conda 环境,因为 r.yaml
不是有效的环境文件。
这就是我试图重现您的问题的方法:
r.yaml
cat r.yaml
remotes::install_github("ramiromagno/gwasrapidd", upgrade = "never")
蛇文件:
cat Snakefile
rule create_r_environment:
conda:
"r.yaml"
script:
"r-dependencies.R"
执行:
snakemake -j1 --use-conda -R create_r_environment
Building DAG of jobs...
Creating conda environment r.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/dario/Downloads/r.yaml:
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda/exceptions.py", line 1079, in __call__
return func(*args, **kwargs)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main.py", line 80, in do_call
exit_code = getattr(module, func_name)(args, parser)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/cli/main_create.py", line 80, in execute
directory=os.getcwd())
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/__init__.py", line 40, in detect
if spec.can_handle():
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
self._environment = env.from_file(self.filename)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 151, in from_file
return from_yaml(yamlstr, filename=filename)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 137, in from_yaml
data = validate_keys(data, kwargs)
File "/home/dario/miniconda3/lib/python3.7/site-packages/conda_env/env.py", line 35, in validate_keys
new_data = data.copy() if data else {}
AttributeError: 'str' object has no attribute 'copy'
`$ /home/dario/miniconda3/bin/conda-env create --file /home/dario/Downloads/.snakemake/conda/095b0ca2.yaml --prefix /home/dario/Downloads/.snakemake/conda/095b0ca2`
environment variables:
CIO_TEST=<not set>
CMAKE_PREFIX_PATH=/home/dario/miniconda3/envs/tritume:/home/dario/miniconda3/envs/tritum
e/x86_64-conda-linux-gnu/sysroot/usr
CONDA_AUTO_UPDATE_CONDA=false
CONDA_BUILD_SYSROOT=/home/dario/miniconda3/envs/tritume/x86_64-conda-linux-gnu/sysroot
CONDA_DEFAULT_ENV=tritume
CONDA_EXE=/home/dario/miniconda3/bin/conda
CONDA_PREFIX=/home/dario/miniconda3/envs/tritume
CONDA_PROMPT_MODIFIER=(tritume)
CONDA_PYTHON_EXE=/home/dario/miniconda3/bin/python
CONDA_ROOT=/home/dario/miniconda3
CONDA_SHLVL=1
DEFAULTS_PATH=/usr/share/gconf/ubuntu.default.path
MANDATORY_PATH=/usr/share/gconf/ubuntu.mandatory.path
PATH=/home/dario/miniconda3/envs/tritume/bin:/home/dario/miniconda3/condabi
n:/opt/gradle/gradle-5.2/bin:/home/dario/.local/share/umake/bin:/home/
dario/.local/bin:/home/dario/bin:/opt/gradle/gradle-5.2/bin:/usr/local
/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/loc
al/games:/snap/bin:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-1
0-oracle/db/bin
REQUESTS_CA_BUNDLE=<not set>
SSL_CERT_FILE=<not set>
WINDOWPATH=2
active environment : tritume
active env location : /home/dario/miniconda3/envs/tritume
shell level : 1
user config file : /home/dario/.condarc
populated config files : /home/dario/.condarc
conda version : 4.8.3
conda-build version : not installed
python version : 3.7.6.final.0
virtual packages : __glibc=2.27
base environment : /home/dario/miniconda3 (writable)
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/bioconda/linux-64
https://conda.anaconda.org/bioconda/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/dario/miniconda3/pkgs
/home/dario/.conda/pkgs
envs directories : /home/dario/miniconda3/envs
/home/dario/.conda/envs
platform : linux-64
user-agent : conda/4.8.3 requests/2.22.0 CPython/3.7.6 Linux/4.15.0-91-generic ubuntu/18.04.4 glibc/2.27
UID:GID : 1001:1001
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers?
[y/N]:
Timeout reached. No report sent.
File "/home/dario/miniconda3/envs/tritume/lib/python3.6/site-packages/snakemake/deployment/conda.py", line 320, in create
无论如何,你的错误是:
... r-dependencies.R' returned non-zero exit status 1
你在 r-dependencies.R
里有什么?