Snakemake 使用所有样本作为 porechop 的一个输入
Snakemake use all samples as one input with porechop
我正在尝试使用 Snakemake 工作流程对多个数据使用 porechop。
在我的Snakefile中,除了all规则外,还有三个规则,一个fastqc规则和一个porechop规则。 fastqc 规则非常有效,我的三个 fastq 都有三个。但是对于 porechop,它不是 运行 命令三次,而是同时为所有三个文件运行一次带有 -i 标志的命令:
Error in rule porechop:
jobid: 2
output: /ngs/prod/nanocea_project/test/prod/porechop/25022021_2_pore.fastq.gz, /ngs/prod/nanocea_project/test/prod/porechop/02062021_1_pore.fastq.gz, /ngs/prod/nanocea_project/test/prod/porechop/02062021_2_pore.fastq.gz
conda-env: /ngs/prod/nanocea_project/test/.snakemake/conda/a72fb141b37718b7c37d9f32d597faeb
shell:
porechop -i /ngs/prod/nanocea_project/test/reads/25022021_2.fastq.gz /ngs/prod/nanocea_project/test/reads/02062021_1.fastq.gz /ngs/prod/nanocea_project/test/reads/02062021_2.fastq.gz -o /ngs/prod/nanocea_project/test/prod/porechop/25022021_2_pore.fastq.gz /ngs/prod/nanocea_project/test/prod/porechop/02062021_1_pore.fastq.gz /ngs/prod/nanocea_project/test/prod/porechop/02062021_2_pore.fastq.gz -t 40 --discard_middle
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
但是,当我将它与单个示例一起使用时,程序可以正常工作。
这是我的代码:
import glob
import os
###Global Variables###
FORMATS=["zip", "html"]
DIR_FASTQ="/ngs/prod/nanocea_project/test/reads"
###FASTQ Files###
def list_samples(DIR_FASTQ):
SAMPLES=[]
for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
base=os.path.basename(file)
sample=(base.replace('.fastq.gz', ''))
SAMPLES.append(sample)
return(SAMPLES)
SAMPLES=list_samples(DIR_FASTQ)
###Rules###
rule all:
input:
expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS),
expand("/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz", sample=SAMPLES)
rule fastqc:
input:
expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
output:
expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)
threads:
16
conda:
"envs/fastqc.yaml"
shell:
"fastqc {input} -o /ngs/prod/nanocea_project/test/stats/fastqc/ -t {threads}"
rule porechop:
input:
expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
output:
expand("/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz", sample=SAMPLES)
threads:
40
conda:
"envs/porechop.yaml"
shell:
"porechop -i {input} -o {output} -t {threads} --discard_middle"
你知道哪里出了问题吗?
谢谢!
这个问题经常出现...如果您在 input:
或 output:
中使用 expand()
,那么您就是在为规则提供所有文件的列表。这和写一样:
input:
['sample1.fastq', 'sample2.fastq', ..., 'sampleN.fastq'],
output:
['sample1.pore.fastq', 'sample2.pore.fastq', ..., 'sampleN.pore.fastq'],
到 运行 每个 input/output 的规则只需删除扩展:
rule porechop:
input:
DIR_FASTQ+"/{sample}.fastq.gz"
output:
"/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz",
我正在尝试使用 Snakemake 工作流程对多个数据使用 porechop。
在我的Snakefile中,除了all规则外,还有三个规则,一个fastqc规则和一个porechop规则。 fastqc 规则非常有效,我的三个 fastq 都有三个。但是对于 porechop,它不是 运行 命令三次,而是同时为所有三个文件运行一次带有 -i 标志的命令:
Error in rule porechop:
jobid: 2
output: /ngs/prod/nanocea_project/test/prod/porechop/25022021_2_pore.fastq.gz, /ngs/prod/nanocea_project/test/prod/porechop/02062021_1_pore.fastq.gz, /ngs/prod/nanocea_project/test/prod/porechop/02062021_2_pore.fastq.gz
conda-env: /ngs/prod/nanocea_project/test/.snakemake/conda/a72fb141b37718b7c37d9f32d597faeb
shell:
porechop -i /ngs/prod/nanocea_project/test/reads/25022021_2.fastq.gz /ngs/prod/nanocea_project/test/reads/02062021_1.fastq.gz /ngs/prod/nanocea_project/test/reads/02062021_2.fastq.gz -o /ngs/prod/nanocea_project/test/prod/porechop/25022021_2_pore.fastq.gz /ngs/prod/nanocea_project/test/prod/porechop/02062021_1_pore.fastq.gz /ngs/prod/nanocea_project/test/prod/porechop/02062021_2_pore.fastq.gz -t 40 --discard_middle
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
但是,当我将它与单个示例一起使用时,程序可以正常工作。
这是我的代码:
import glob
import os
###Global Variables###
FORMATS=["zip", "html"]
DIR_FASTQ="/ngs/prod/nanocea_project/test/reads"
###FASTQ Files###
def list_samples(DIR_FASTQ):
SAMPLES=[]
for file in glob.glob(DIR_FASTQ+"/*.fastq.gz"):
base=os.path.basename(file)
sample=(base.replace('.fastq.gz', ''))
SAMPLES.append(sample)
return(SAMPLES)
SAMPLES=list_samples(DIR_FASTQ)
###Rules###
rule all:
input:
expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS),
expand("/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz", sample=SAMPLES)
rule fastqc:
input:
expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
output:
expand("/ngs/prod/nanocea_project/test/stats/fastqc/{sample}_fastqc.{ext}", sample=SAMPLES, ext=FORMATS)
threads:
16
conda:
"envs/fastqc.yaml"
shell:
"fastqc {input} -o /ngs/prod/nanocea_project/test/stats/fastqc/ -t {threads}"
rule porechop:
input:
expand(DIR_FASTQ+"/{sample}.fastq.gz", sample=SAMPLES)
output:
expand("/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz", sample=SAMPLES)
threads:
40
conda:
"envs/porechop.yaml"
shell:
"porechop -i {input} -o {output} -t {threads} --discard_middle"
你知道哪里出了问题吗?
谢谢!
这个问题经常出现...如果您在 input:
或 output:
中使用 expand()
,那么您就是在为规则提供所有文件的列表。这和写一样:
input:
['sample1.fastq', 'sample2.fastq', ..., 'sampleN.fastq'],
output:
['sample1.pore.fastq', 'sample2.pore.fastq', ..., 'sampleN.pore.fastq'],
到 运行 每个 input/output 的规则只需删除扩展:
rule porechop:
input:
DIR_FASTQ+"/{sample}.fastq.gz"
output:
"/ngs/prod/nanocea_project/test/prod/porechop/{sample}_pore.fastq.gz",