Snakemake:STAR 在 snakemake 中失败但不是独立的
Snakemake: STAR fails in snakemake but not standalone
编辑,在尝试任何事情之前,确保你安装了 Snakemake:
conda install -c bioconda -c conda-forge snakemake
正如这里所宣传的那样:snakemake.readthedocs.io. Don't install it as advertised here: anaconda.org/bioconda/snakemake,你最终会得到一个非常旧的版本(-c conda-forge 很重要!)
原post=>
我今天一直在和Snakemake搏斗。我的问题是我的 STAR 规则给我一个错误:
/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake/etc/conda/activate.d/activate-binutils_linux-64.sh: line 67: HOST: unbound variable
Error in job star_map while creating output file /rst1/2017-0205_illuminaseq/scratch/swo-406/preprocessing/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14.Aligned.out.bam.
RuleException:
CalledProcessError in line 50 of /home/nlv24077/experiments/experiments/swo-406/scripts/Snakefile.snakefile:
Command '
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
STAR --runThreadN 8 --genomeDir /rst1/2017-0205_illuminaseq/scratch/swo-390/STAR_references/human --readFilesIn /rst1/2017-0205_illuminaseq/scratch/swo-406/fastq/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14_R1_001.fastq.gz /rst1/2017-0205_illuminaseq/scratch/swo-406/fastq/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14_R2_001.fastq.gz --outSAMtype BAM Unsorted --readFilesCommand zcat --outFileNamePrefix /rst1/2017-0205_illuminaseq/scratch/swo-406/preprocessing/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14.
' returned non-zero exit status 1.
File "/home/nlv24077/experiments/experiments/swo-406/scripts/Snakefile.snakefile", line 50, in __rule_star_map
File "/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
但是,当我将 script/command 复制到终端时,它就可以工作了。
这是我的蛇文件:
import os
from glob import glob
#from snakemake.utils import validate
configfile: 'config.yaml'
#validate(config, "config.schema.yaml")
# Set the working directory
workdir: config['workdir']
experiment_name = 'swo-406'
scratch_data_base_dir="/rst1/2017-0205_illuminaseq/scratch"
scratch_data_dir = os.path.join(scratch_data_base_dir, experiment_name)
seqrun = '180413_NB501997_0054_AHTFJ3BGX3'
fastq_dir = os.path.join(scratch_data_dir, 'fastq', seqrun)
preprocessing_dir = os.path.join(scratch_data_dir, 'preprocessing', seqrun)
quantification_dir = os.path.join(scratch_data_dir, 'quantification', seqrun)
if not os.path.isdir(preprocessing_dir):
os.makedirs(preprocessing_dir)
#ref_base_dir = config[ref_base_dir]
ref_genome = os.path.join(config['ref_base_dir'], config['ref_genome'])
star_ref_dir = config['star_ref_dir']
## Rsem settings
rsem_ref_dir = os.path.join(scratch_data_base_dir, 'swo-387', 'RSEM_references')
rsem_ref_base = os.path.join(rsem_ref_dir, 'Homo_sapiens.GRCh38')
log = os.path.join(preprocessing_dir, 'log.txt')
SAMPLES = set([os.path.basename(fastq_file.replace('_R1_001.fastq.gz', '').replace('_R2_001.fastq.gz', ''))
for fastq_file in glob(os.path.join(fastq_dir, '*_R*_001.fastq.gz'))
if not 'Undetermined' in fastq_file])
#star_output_prefix = os.path.join(preprocessing_dir, '{sample}.')
# Rule all is a pseudo-rule that tells snakemake what final files to generate.
rule all:
input:
expand(os.path.join(quantification_dir, '{sample}'), sample=SAMPLES)
rule star_map:
input:
os.path.join(fastq_dir, '{sample}_R1_001.fastq.gz'),
os.path.join(fastq_dir, '{sample}_R2_001.fastq.gz'),
output:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.bam')
shell:
"""
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
STAR \
--runThreadN 8 \
--genomeDir {star_ref_dir} \
--readFilesIn {input} \
--outSAMtype BAM Unsorted \
--readFilesCommand zcat \
--outFileNamePrefix {preprocessing_dir}/{wildcards.sample}.
"""
rule samtools_sort:
input:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.bam')
output:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.sorted.bam')
shell:
"""
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
samtools sort -T {wildcards.sample} -O bam {input} > {output}
"""
rule rsem_quantify:
input:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.sorted.bam')
output:
os.path.join(quantification_dir, '{sample}')
shell:
"""
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
rsem-calculate-expression \
--paired-end \
--bam \
--num-threads 8 \
--strandedness reverse \
{rsem_ref_base} \
{output}
"""
谁能找出错误?
对了,我要注释掉
validate(config, "config.schema.yaml")
因为我的snakemake.utils好像没有"validate":
(/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake) 16:40 nlv24077@kavia /rst1/2017-0205_illuminaseq/scratch/swo-406 > python3
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from snakemake.utils import validate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'validate'
>>>
最诚挚的问候,
免费。
您能否从 Snakefile 中不同规则的 shell 部分删除所有 source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
命令并激活环境:
运行 命令 source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
在你实际 运行 Snakefile 上的 snakemake 之前(你甚至可以添加一个具有 validate
的 snakemake 版本到这个环境)。所以你可以 运行 source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
然后 运行 snakemake
.
创建一个与该环境匹配的 conda 环境文件,并在需要该环境的规则中添加 conda : path/to/created/env/file
参数。然后 运行 带有 --use-conda
标志的 snakemake
由于您对所有规则使用相同的环境,因此最好使用选项 1,因为选项 2 速度慢得多并且会使其不必要地特定于规则。
我可以用这个例子重现你的错误 Snakefile:
rule test_activate:
output : "test.txt"
shell: "source activate NGS && conda list > {output}"
我得到了相同的未绑定变量错误,但由于我的环境不同,所以使用了不同的变量。这是对可能发生的事情的解释:
Virtualenv activate script won't run in bash script with set -euo
从某种意义上说,一旦您 运行 通过 snakemake vs 终端,一些变量就会变得未绑定,这被视为错误。
编辑,在尝试任何事情之前,确保你安装了 Snakemake:
conda install -c bioconda -c conda-forge snakemake
正如这里所宣传的那样:snakemake.readthedocs.io. Don't install it as advertised here: anaconda.org/bioconda/snakemake,你最终会得到一个非常旧的版本(-c conda-forge 很重要!)
原post=>
我今天一直在和Snakemake搏斗。我的问题是我的 STAR 规则给我一个错误:
/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake/etc/conda/activate.d/activate-binutils_linux-64.sh: line 67: HOST: unbound variable
Error in job star_map while creating output file /rst1/2017-0205_illuminaseq/scratch/swo-406/preprocessing/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14.Aligned.out.bam.
RuleException:
CalledProcessError in line 50 of /home/nlv24077/experiments/experiments/swo-406/scripts/Snakefile.snakefile:
Command '
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
STAR --runThreadN 8 --genomeDir /rst1/2017-0205_illuminaseq/scratch/swo-390/STAR_references/human --readFilesIn /rst1/2017-0205_illuminaseq/scratch/swo-406/fastq/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14_R1_001.fastq.gz /rst1/2017-0205_illuminaseq/scratch/swo-406/fastq/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14_R2_001.fastq.gz --outSAMtype BAM Unsorted --readFilesCommand zcat --outFileNamePrefix /rst1/2017-0205_illuminaseq/scratch/swo-406/preprocessing/180413_NB501997_0054_AHTFJ3BGX3/0054_P2018SEQE15S4_S14.
' returned non-zero exit status 1.
File "/home/nlv24077/experiments/experiments/swo-406/scripts/Snakefile.snakefile", line 50, in __rule_star_map
File "/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
但是,当我将 script/command 复制到终端时,它就可以工作了。
这是我的蛇文件:
import os
from glob import glob
#from snakemake.utils import validate
configfile: 'config.yaml'
#validate(config, "config.schema.yaml")
# Set the working directory
workdir: config['workdir']
experiment_name = 'swo-406'
scratch_data_base_dir="/rst1/2017-0205_illuminaseq/scratch"
scratch_data_dir = os.path.join(scratch_data_base_dir, experiment_name)
seqrun = '180413_NB501997_0054_AHTFJ3BGX3'
fastq_dir = os.path.join(scratch_data_dir, 'fastq', seqrun)
preprocessing_dir = os.path.join(scratch_data_dir, 'preprocessing', seqrun)
quantification_dir = os.path.join(scratch_data_dir, 'quantification', seqrun)
if not os.path.isdir(preprocessing_dir):
os.makedirs(preprocessing_dir)
#ref_base_dir = config[ref_base_dir]
ref_genome = os.path.join(config['ref_base_dir'], config['ref_genome'])
star_ref_dir = config['star_ref_dir']
## Rsem settings
rsem_ref_dir = os.path.join(scratch_data_base_dir, 'swo-387', 'RSEM_references')
rsem_ref_base = os.path.join(rsem_ref_dir, 'Homo_sapiens.GRCh38')
log = os.path.join(preprocessing_dir, 'log.txt')
SAMPLES = set([os.path.basename(fastq_file.replace('_R1_001.fastq.gz', '').replace('_R2_001.fastq.gz', ''))
for fastq_file in glob(os.path.join(fastq_dir, '*_R*_001.fastq.gz'))
if not 'Undetermined' in fastq_file])
#star_output_prefix = os.path.join(preprocessing_dir, '{sample}.')
# Rule all is a pseudo-rule that tells snakemake what final files to generate.
rule all:
input:
expand(os.path.join(quantification_dir, '{sample}'), sample=SAMPLES)
rule star_map:
input:
os.path.join(fastq_dir, '{sample}_R1_001.fastq.gz'),
os.path.join(fastq_dir, '{sample}_R2_001.fastq.gz'),
output:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.bam')
shell:
"""
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
STAR \
--runThreadN 8 \
--genomeDir {star_ref_dir} \
--readFilesIn {input} \
--outSAMtype BAM Unsorted \
--readFilesCommand zcat \
--outFileNamePrefix {preprocessing_dir}/{wildcards.sample}.
"""
rule samtools_sort:
input:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.bam')
output:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.sorted.bam')
shell:
"""
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
samtools sort -T {wildcards.sample} -O bam {input} > {output}
"""
rule rsem_quantify:
input:
os.path.join(preprocessing_dir, '{sample}.Aligned.out.sorted.bam')
output:
os.path.join(quantification_dir, '{sample}')
shell:
"""
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
rsem-calculate-expression \
--paired-end \
--bam \
--num-threads 8 \
--strandedness reverse \
{rsem_ref_base} \
{output}
"""
谁能找出错误? 对了,我要注释掉
validate(config, "config.schema.yaml")
因为我的snakemake.utils好像没有"validate":
(/rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake) 16:40 nlv24077@kavia /rst1/2017-0205_illuminaseq/scratch/swo-406 > python3
Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from snakemake.utils import validate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'validate'
>>>
最诚挚的问候,
免费。
您能否从 Snakefile 中不同规则的 shell 部分删除所有 source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
命令并激活环境:
运行 命令
source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
在你实际 运行 Snakefile 上的 snakemake 之前(你甚至可以添加一个具有validate
的 snakemake 版本到这个环境)。所以你可以 运行source activate /rst1/2017-0205_illuminaseq/scratch/swo-406/snakemake
然后 运行snakemake
.创建一个与该环境匹配的 conda 环境文件,并在需要该环境的规则中添加
conda : path/to/created/env/file
参数。然后 运行 带有--use-conda
标志的 snakemake
由于您对所有规则使用相同的环境,因此最好使用选项 1,因为选项 2 速度慢得多并且会使其不必要地特定于规则。
我可以用这个例子重现你的错误 Snakefile:
rule test_activate:
output : "test.txt"
shell: "source activate NGS && conda list > {output}"
我得到了相同的未绑定变量错误,但由于我的环境不同,所以使用了不同的变量。这是对可能发生的事情的解释:
Virtualenv activate script won't run in bash script with set -euo
从某种意义上说,一旦您 运行 通过 snakemake vs 终端,一些变量就会变得未绑定,这被视为错误。