如何为 htslib (bgzip + tabix) 使用 Snakemake 容器
How to use Snakemake container for htslib (bgzip + tabix)
我有一个使用全局奇点图像和基于规则的 conda 包装器的管道。
但是,有些工具没有包装器(即 htslib
的bgzip
和tabix
)。
现在我需要学习如何 run jobs in containers。
在官方文档中 link 它说:
"Allowed image urls entail everything supported by singularity (e.g., shub://
and docker://
)."
现在我尝试了来自奇点中心的以下图像,但出现错误:
最小可重现示例:
config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
GENOME_INDEX=config["REF_GENOME"]+".fai"
VEP_ANNOT=config["GENOME_ANNOTATION"]+".gz"
VEP_ANNOT_INDEX=config["GENOME_ANNOTATION"]+".gz.tbi"
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
expand('{REF_DIR}/{GENOME_ANNOTATION}{ext}', REF_DIR=dirs_dict["REF_DIR"], GENOME_ANNOTATION=config["GENOME_ANNOTATION"], ext=['', '.gz', '.gz.tbi']),
expand('{REF_DIR}/{REF_GENOME}{ext}', REF_DIR=dirs_dict["REF_DIR"], REF_GENOME=config["REF_GENOME"], ext=['','.fai']),
rule download_references:
params:
ref_genome=config["REF_GENOME"],
genome_annotation=config["GENOME_ANNOTATION"],
ref_dir=dirs_dict["REF_DIR"]
output:
os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),
os.path.join(dirs_dict["REF_DIR"],config["GENOME_ANNOTATION"]),
os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT),
os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT_INDEX)
resources:
mem=80000,
time=45
log:
os.path.join(dirs_dict["LOG_DIR"],"references","download.log")
singularity:
"shub://biocontainers/tabix"
shell: """
cd {params.ref_dir}
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.genomic.fa.gz
bgzip -d {params.ref_genome}.gz
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.annotations.gff3.gz
bgzip -d {params.genome_annotation}.gz
grep -v "#" {params.genome_annotation} | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > {params.genome_annotation}.gz
tabix -p gff {params.genome_annotation}.gz
"""
rule index_reference:
input:
os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"])
output:
os.path.join(dirs_dict["REF_DIR"],GENOME_INDEX)
resources:
mem=2000,
time=30,
log:
os.path.join(dirs_dict["LOG_DIR"],"references", "faidx_index.log")
wrapper:
"0.64.0/bio/samtools/faidx"
错误
Building DAG of jobs...
Pulling singularity image shub://biocontainers/tabix.
WorkflowError:
Failed to pull singularity image from shub://biocontainers/tabix:
ESC[31mFATAL: ESC[0m While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/singularity.py", line 88, in pull
~
这似乎是容器的问题?
(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull shub://biocontainers/tabix
FATAL: While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub
事实上,我在使用其他 biocontainers
容器时遇到过这个问题。
例如,我还需要使用一个容器来进行 bowtie2
索引,这是我从 biocontainers/bowtie2
与同一工具 comics/bowtie2
的另一个开发人员容器中得到的错误:
^C(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://biocontainers/bowtie2
FATAL: While making image from oci registry: failed to get checksum for docker://biocontainers/bowtie2: Error reading manifest latest in docker.io/biocontainers/bowtie2: manifest unknown: manifest unknown
(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://comics/bowtie2
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob a02a4930cb5d done
有人知道为什么吗?
使用另一个容器解决问题;然而,我从 biocontainers
中得到错误这一事实令人不安,因为这些错误都很常见并且在文献中被用作示例,所以我将把 top-answer 奖励给能够解决该特定问题的人。
可以说,stackleader/bgzip-utility
的使用实际上解决了容器中 运行 这条规则的问题。
container:
"docker://stackleader/bgzip-utility"
再一次,对于那些来到这个 post 的人来说,最好在 运行 snakemake
之前先测试任何容器, 例如 singularity pull docker://stackleader/bgzip-utility
.
Biocontainers 不允许 latest
作为其容器的标签,因此您需要指定要使用的标签。
来自他们的doc:
The BioContainers community had decided to remove the latest tag. Then, the following command docker pull biocontainers/crux will fail. Read more about this decision in Getting started with Docker
不指定标签时,默认为latest
标签,这里当然不允许。有关 bowtie2 的标签,请参阅 here。像这样的用法将起作用:
singularity pull docker://biocontainers/bowtie2:v2.4.1_cv1
我有一个使用全局奇点图像和基于规则的 conda 包装器的管道。
但是,有些工具没有包装器(即 htslib
的bgzip
和tabix
)。
现在我需要学习如何 run jobs in containers。
在官方文档中 link 它说:
"Allowed image urls entail everything supported by singularity (e.g.,
shub://
anddocker://
)."
现在我尝试了来自奇点中心的以下图像,但出现错误:
最小可重现示例:
config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
GENOME_INDEX=config["REF_GENOME"]+".fai"
VEP_ANNOT=config["GENOME_ANNOTATION"]+".gz"
VEP_ANNOT_INDEX=config["GENOME_ANNOTATION"]+".gz.tbi"
# Singularity with conda wrappers
singularity: "docker://continuumio/miniconda3:4.5.11"
# Rules -----------------------------------------------------------------------
rule all:
input:
expand('{REF_DIR}/{GENOME_ANNOTATION}{ext}', REF_DIR=dirs_dict["REF_DIR"], GENOME_ANNOTATION=config["GENOME_ANNOTATION"], ext=['', '.gz', '.gz.tbi']),
expand('{REF_DIR}/{REF_GENOME}{ext}', REF_DIR=dirs_dict["REF_DIR"], REF_GENOME=config["REF_GENOME"], ext=['','.fai']),
rule download_references:
params:
ref_genome=config["REF_GENOME"],
genome_annotation=config["GENOME_ANNOTATION"],
ref_dir=dirs_dict["REF_DIR"]
output:
os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),
os.path.join(dirs_dict["REF_DIR"],config["GENOME_ANNOTATION"]),
os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT),
os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT_INDEX)
resources:
mem=80000,
time=45
log:
os.path.join(dirs_dict["LOG_DIR"],"references","download.log")
singularity:
"shub://biocontainers/tabix"
shell: """
cd {params.ref_dir}
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.genomic.fa.gz
bgzip -d {params.ref_genome}.gz
wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.annotations.gff3.gz
bgzip -d {params.genome_annotation}.gz
grep -v "#" {params.genome_annotation} | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > {params.genome_annotation}.gz
tabix -p gff {params.genome_annotation}.gz
"""
rule index_reference:
input:
os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"])
output:
os.path.join(dirs_dict["REF_DIR"],GENOME_INDEX)
resources:
mem=2000,
time=30,
log:
os.path.join(dirs_dict["LOG_DIR"],"references", "faidx_index.log")
wrapper:
"0.64.0/bio/samtools/faidx"
错误
Building DAG of jobs...
Pulling singularity image shub://biocontainers/tabix.
WorkflowError:
Failed to pull singularity image from shub://biocontainers/tabix:
ESC[31mFATAL: ESC[0m While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub
File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/singularity.py", line 88, in pull
~
这似乎是容器的问题?
(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull shub://biocontainers/tabix
FATAL: While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub
事实上,我在使用其他 biocontainers
容器时遇到过这个问题。
例如,我还需要使用一个容器来进行 bowtie2
索引,这是我从 biocontainers/bowtie2
与同一工具 comics/bowtie2
的另一个开发人员容器中得到的错误:
^C(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://biocontainers/bowtie2
FATAL: While making image from oci registry: failed to get checksum for docker://biocontainers/bowtie2: Error reading manifest latest in docker.io/biocontainers/bowtie2: manifest unknown: manifest unknown
(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://comics/bowtie2
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob a02a4930cb5d done
有人知道为什么吗?
使用另一个容器解决问题;然而,我从 biocontainers
中得到错误这一事实令人不安,因为这些错误都很常见并且在文献中被用作示例,所以我将把 top-answer 奖励给能够解决该特定问题的人。
可以说,stackleader/bgzip-utility
的使用实际上解决了容器中 运行 这条规则的问题。
container:
"docker://stackleader/bgzip-utility"
再一次,对于那些来到这个 post 的人来说,最好在 运行 snakemake
之前先测试任何容器, 例如 singularity pull docker://stackleader/bgzip-utility
.
Biocontainers 不允许 latest
作为其容器的标签,因此您需要指定要使用的标签。
来自他们的doc:
The BioContainers community had decided to remove the latest tag. Then, the following command docker pull biocontainers/crux will fail. Read more about this decision in Getting started with Docker
不指定标签时,默认为latest
标签,这里当然不允许。有关 bowtie2 的标签,请参阅 here。像这样的用法将起作用:
singularity pull docker://biocontainers/bowtie2:v2.4.1_cv1