Snakemake:多个文件上的 运行 BWA 时 CalledProcessError
Snakemake : CalledProcessError when running BWA on multiple files
我有一个包含多个子文件夹的文件夹,每个子文件夹都包含我想与基因组比对的 .fastq 文件。我正在尝试为其创建一个 snakemake 工作流程。首先,我使用通配符访问每个子目录及其中的文件。然后我使用 expand 函数存储文件的所有路径并编写规则将文件映射到基因组。代码如下:
from snakemake.io import glob_wildcards, expand
import sys
import os
directories, files = glob_wildcards("data/samples/{dir}/{file}.fastq")
print(directories, files)
rule all:
input:
expand("data/samples/{dir}/{file}.fastq", zip, dir=directories,
file=files)
rule bwa_map:
input:
G = "data/genome.fa",
r1 = expand("data/samples/{dir}/{file}.fastq", zip,
dir=directories, file=files)
output:
r2 = expand("data/results/{dir}/{file}.bam", zip, dir=directories,
file=files)
shell:
"./bwa mem {input.G} {input.r1} | ./samtools sort -o - > {output.r2}"
但是,当我以 "snakemake bwa_map" 执行此代码时,出现以下错误:
Error in job bwa_map while creating output files data/results/SRR5923/A.bam, data/results/SRR5924/B.bam, data/results/SRR5925/C.bam.
RuleException:
CalledProcessError in line 19 of /Users/rewatitappu/PycharmProjects/RNA-seq_Snakemake/Snakefile:
Command './bwa mem data/genome.fa data/samples/SRR5923/A.fastq data/samples/SRR5924/B.fastq data/samples/SRR5925/C.fastq | ./samtools sort -o - > data/results/SRR5923/A.bam data/results/SRR5924/B.bam data/results/SRR5925/C.bam' returned non-zero exit status 1.
File "/Users/rewatitappu/PycharmProjects/RNA-seq_Snakemake/Snakefile", line 19, in __rule_bwa_map
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job bwa_map since they might be corrupted:
data/results/SRR5923/A.bam
Will exit after finishing currently running jobs.
是我错误地执行了snakemake命令还是代码有问题?
错误消息表明错误发生在执行以下shell命令时:
./bwa mem data/genome.fa data/samples/SRR5923/A.fastq data/samples/SRR5924/B.fastq data/samples/SRR5925/C.fastq | ./samtools sort -o - > data/results/SRR5923/A.bam data/results/SRR5924/B.bam data/results/SRR5925/C.bam
问题可能是因为您有两个 bam 文件作为输出。
您可能不应该在 bwa_map
规则中使用 expand
。扩展已在 all
规则中进行。
我有一个包含多个子文件夹的文件夹,每个子文件夹都包含我想与基因组比对的 .fastq 文件。我正在尝试为其创建一个 snakemake 工作流程。首先,我使用通配符访问每个子目录及其中的文件。然后我使用 expand 函数存储文件的所有路径并编写规则将文件映射到基因组。代码如下:
from snakemake.io import glob_wildcards, expand
import sys
import os
directories, files = glob_wildcards("data/samples/{dir}/{file}.fastq")
print(directories, files)
rule all:
input:
expand("data/samples/{dir}/{file}.fastq", zip, dir=directories,
file=files)
rule bwa_map:
input:
G = "data/genome.fa",
r1 = expand("data/samples/{dir}/{file}.fastq", zip,
dir=directories, file=files)
output:
r2 = expand("data/results/{dir}/{file}.bam", zip, dir=directories,
file=files)
shell:
"./bwa mem {input.G} {input.r1} | ./samtools sort -o - > {output.r2}"
但是,当我以 "snakemake bwa_map" 执行此代码时,出现以下错误:
Error in job bwa_map while creating output files data/results/SRR5923/A.bam, data/results/SRR5924/B.bam, data/results/SRR5925/C.bam.
RuleException:
CalledProcessError in line 19 of /Users/rewatitappu/PycharmProjects/RNA-seq_Snakemake/Snakefile:
Command './bwa mem data/genome.fa data/samples/SRR5923/A.fastq data/samples/SRR5924/B.fastq data/samples/SRR5925/C.fastq | ./samtools sort -o - > data/results/SRR5923/A.bam data/results/SRR5924/B.bam data/results/SRR5925/C.bam' returned non-zero exit status 1.
File "/Users/rewatitappu/PycharmProjects/RNA-seq_Snakemake/Snakefile", line 19, in __rule_bwa_map
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job bwa_map since they might be corrupted:
data/results/SRR5923/A.bam
Will exit after finishing currently running jobs.
是我错误地执行了snakemake命令还是代码有问题?
错误消息表明错误发生在执行以下shell命令时:
./bwa mem data/genome.fa data/samples/SRR5923/A.fastq data/samples/SRR5924/B.fastq data/samples/SRR5925/C.fastq | ./samtools sort -o - > data/results/SRR5923/A.bam data/results/SRR5924/B.bam data/results/SRR5925/C.bam
问题可能是因为您有两个 bam 文件作为输出。
您可能不应该在 bwa_map
规则中使用 expand
。扩展已在 all
规则中进行。