如何为 htslib (bgzip + tabix) 使用 Snakemake 容器

How to use Snakemake container for htslib (bgzip + tabix)

我有一个使用全局奇点图像和基于规则的 conda 包装器的管道。

但是,有些工具没有包装器( htslibbgziptabix)。

现在我需要学习如何 run jobs in containers

在官方文档中 link 它说:

"Allowed image urls entail everything supported by singularity (e.g., shub:// and docker://)."

现在我尝试了来自奇点中心的以下图像,但出现错误:

最小可重现示例:

config.yaml

# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"

Snakefile

# Directories------------------------------------------------------------------
configfile: "config.yaml"

# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))

GENOME_INDEX=config["REF_GENOME"]+".fai"
VEP_ANNOT=config["GENOME_ANNOTATION"]+".gz"
VEP_ANNOT_INDEX=config["GENOME_ANNOTATION"]+".gz.tbi"

# Singularity with conda wrappers

singularity: "docker://continuumio/miniconda3:4.5.11"

# Rules -----------------------------------------------------------------------

rule all:
    input:
    expand('{REF_DIR}/{GENOME_ANNOTATION}{ext}', REF_DIR=dirs_dict["REF_DIR"], GENOME_ANNOTATION=config["GENOME_ANNOTATION"], ext=['', '.gz', '.gz.tbi']),
        expand('{REF_DIR}/{REF_GENOME}{ext}', REF_DIR=dirs_dict["REF_DIR"], REF_GENOME=config["REF_GENOME"], ext=['','.fai']),

rule download_references:
    params:
    ref_genome=config["REF_GENOME"],
        genome_annotation=config["GENOME_ANNOTATION"],
        ref_dir=dirs_dict["REF_DIR"]
    output:
    os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),
        os.path.join(dirs_dict["REF_DIR"],config["GENOME_ANNOTATION"]),
        os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT),
        os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT_INDEX)
    resources:
    mem=80000,
        time=45
    log:
        os.path.join(dirs_dict["LOG_DIR"],"references","download.log")
    singularity:
        "shub://biocontainers/tabix"
    shell: """
    cd {params.ref_dir}
        wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.genomic.fa.gz
        bgzip -d {params.ref_genome}.gz
        wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.annotations.gff3.gz
        bgzip -d {params.genome_annotation}.gz
        grep -v "#" {params.genome_annotation} | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > {params.genome_annotation}.gz
        tabix -p gff {params.genome_annotation}.gz
        """


rule index_reference:
    input:
    os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"])
    output:
    os.path.join(dirs_dict["REF_DIR"],GENOME_INDEX)
    resources:
    mem=2000,
        time=30,
    log:
        os.path.join(dirs_dict["LOG_DIR"],"references", "faidx_index.log")
    wrapper:
    "0.64.0/bio/samtools/faidx"

错误

Building DAG of jobs...
Pulling singularity image shub://biocontainers/tabix.
WorkflowError:
Failed to pull singularity image from shub://biocontainers/tabix:
ESC[31mFATAL:  ESC[0m While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub

  File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/singularity.py", line 88, in pull
~

这似乎是容器的问题?

(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull shub://biocontainers/tabix
FATAL:   While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub

事实上,我在使用其他 biocontainers 容器时遇到过这个问题。

例如,我还需要使用一个容器来进行 bowtie2 索引,这是我从 biocontainers/bowtie2 与同一工具 comics/bowtie2 的另一个开发人员容器中得到的错误:

^C(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://biocontainers/bowtie2
FATAL:   While making image from oci registry: failed to get checksum for docker://biocontainers/bowtie2: Error reading manifest latest in docker.io/biocontainers/bowtie2: manifest unknown: manifest unknown
(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://comics/bowtie2
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob a02a4930cb5d done

有人知道为什么吗?

使用另一个容器解决问题;然而,我从 biocontainers 中得到错误这一事实令人不安,因为这些错误都很常见并且在文献中被用作示例,所以我将把 top-answer 奖励给能够解决该特定问题的人。

可以说,stackleader/bgzip-utility 的使用实际上解决了容器中 运行 这条规则的问题。

container:
    "docker://stackleader/bgzip-utility"

再一次,对于那些来到这个 post 的人来说,最好在 运行 snakemake 之前先测试任何容器, 例如 singularity pull docker://stackleader/bgzip-utility.

Biocontainers 不允许 latest 作为其容器的标签,因此您需要指定要使用的标签。

来自他们的doc

The BioContainers community had decided to remove the latest tag. Then, the following command docker pull biocontainers/crux will fail. Read more about this decision in Getting started with Docker

不指定标签时,默认为latest标签,这里当然不允许。有关 bowtie2 的标签,请参阅 here。像这样的用法将起作用:

singularity pull docker://biocontainers/bowtie2:v2.4.1_cv1