Thread.py 错误 snakemake

Thread.py error snakemake

我正在尝试 运行 一个简单的单规则 snakemake 文件,如下所示:

resources_dir='resources'

rule downloadReference:
    output:
        fa = resources_dir+'/human_g1k_v37.fasta',
        fai = resources_dir+'/human_g1k_v37.fasta.fai',
    shell:
        ('mkdir -p '+resources_dir+'; cd '+resources_dir+'; ' +
        'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; ' +
        'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai;')

但是我得到一个错误:

    Error in job downloadReference while creating output files 
    resources/human_g1k_v37.fasta, resources/human_g1k_v37.fasta.fai.
    RuleException:
    CalledProcessError in line 10 of 
    /lustre4/home/masih/projects/NGS_pipeline/snake_test:
    Command 'mkdir -p resources; cd resources; wget ftp://ftp-
  trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai;' returned non-zero exit status 2.
      File "/lustre4/home/masih/projects/NGS_pipeline/snake_test", line 10, in __rule_downloadReference
      File "/home/masih/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
    Removing output files of failed job downloadReference since they might be corrupted:
    resources/human_g1k_v37.fasta
    Will exit after finishing currently running jobs.
    Exiting because a job execution failed. Look above for error message

我没有在 snakemake 中使用线程选项。我不知道这与 thread.py 有什么关系。有人遇到过这个错误吗?

当 shell 命令失败时,它的退出状态不是 0。 "returned non-zero exit status 2"就是这个意思。

您的一个 shell 命令失败,并且失败会传播到 snakemake。我想 snakemake 使用线程,并且失败在 threads.py 文件1.

中的某些代码级别表现出来

为了更好地理解发生了什么,我们可以使用 || 运算符捕获第一个错误,然后是发出错误消息的函数:

# Define functions to be used in shell portions
shell.prefix("""
# http://linuxcommand.org/wss0150.php
PROGNAME=$(basename [=10=])

function error_exit
{{
#   ----------------------------------------------------------------
#   Function for exit due to fatal program error
#       Accepts 1 argument:
#           string containing descriptive error message
#   ----------------------------------------------------------------
    echo "${{PROGNAME}}: ${{1:-"Unknown Error"}}" 1>&2
    exit 1
}}
""")

resources_dir='resources'

rule downloadReference:
    output:
        fa = resources_dir+'/human_g1k_v37.fasta',
        fai = resources_dir+'/human_g1k_v37.fasta.fai',
    params:
        resources_dir = resources_dir
    shell:
        """
        mkdir -p {params.resources_dir}
        cd {params.resources_dir}
        wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz || error_exit "fasta download failed"
        gunzip human_g1k_v37.fasta.gz || error_exit "fasta gunzip failed"
        wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai || error_exit "fai download failed"
        """

当我运行这个时,我在第一次下载的消息后得到以下消息:

gzip: human_g1k_v37.fasta.gz: decompression OK, trailing garbage ignored
bash: fasta gunzip failed

事实证明,gzip 在出现警告时使用非零退出代码:

Exit status is normally 0; if an error occurs, exit status is 1. If a warning occurs, exit status is 2.

(来自 man gzip 的诊断部分)

如果我删除错误捕获 || error_exit "fasta gunzip failed",工作流就能够完成。所以我不明白你一开始为什么会出现这个错误。

我很惊讶 gzip 作者决定在简单警告的情况下使用非零状态。他们添加了一个-q选项来关闭this specific warning, due to the presence of trailing zeroes,但奇怪的是,当使用这个选项时,退出状态仍然是非零的。


1 根据 snakemake 的作者 Johannes Köster 的说法:

Sorry for the misleading thread.py thing, this is just the place where snakemake detects the problem. The real issue is that your command exits with exit code 2, which indicates an error not related to Snakemake