Snakemake 运行 是分步还是并行?
Is Snakemake run step by step or parallel?
我已经使用 Snakemake 编写了一个星期的 RNA-seq 流水线。我仍然不知道 snakemake 的工作 order.The 版本是 5.4.4
我的RNA-seq管道由五部分组成,所以我写了五个规则(规则trim,规则alignemnt,规则sort_to_bam,规则fpkm,规则计数)。当我写一个规则,我将通过 运行 测试它。最后我完成了 it.And 它 运行 当我通过 step.Here 测试每个规则步骤时,我的 Snakefile:
SBT=["wt1","wt2","epcr1","epcr2"]
ruleorder: trim > map > sort2bam > fpkm > count
rule all:
input:
expand("02_clean/{nico}_1.paired.fq.gz", nico=SBT),
expand("02_clean/{nico}_2.paired.fq.gz", nico=SBT),
expand("03_align/{nico}.sam", nico=SBT),
expand("04_exp/{nico}_count.txt", nico=SBT),
expand("05_ft/{nico}_gene.gtf", nico=SBT),
expand("05_ft/{nico}_transcript.gtf", nico=SBT)
rule trim:
input:
"01_raw/{nico}_1.fastq",
"01_raw/{nico}_2.fastq"
output:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_1.unpaired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz",
"02_clean/{nico}_2.unpaired.fq.gz",
threads: 20
shell:
"java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"
rule map:
input:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz"
output:
"03_align/{nico}.sam"
log:
"logs/map/{nico}.log"
threads: 40
shell:
"hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"
rule sort2bam:
input:
"03_align/{nico}.sam"
output:
"03_align/{nico}.bam"
threads:20
shell:
"samtools sort -@ 20 -m 10G -o {output} {input}"
rule count:
input:
"03_align/{nico}.bam"
output:
"04_exp/{nico}_count.txt"
shell:
"featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"
rule fpkm:
input:
"03_align/{nico}.bam"
output:
"05_ft/{nico}_gene.gtf",
"05_ft/{nico}_transcript.gtf"
threads: 40
shell:
"stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A {output[0]} -o {output[1]} {input}"
raw_data显示如下:
(py3) root@SBT:~/s/r/snakemake/my_rnaseq_data/01_raw# tree
.
|-- epcr1_1.fastq
|-- epcr1_2.fastq
|-- epcr2_1.fastq
|-- epcr2_2.fastq
|-- wt1_1.fastq
|-- wt1_2.fastq
|-- wt2_1.fastq
`-- wt2_2.fastq
然后我想从 raw_data 测试管道,删除所有存在的中间文件,我之前的测试步骤 step.Here 是我的 dry_run 结果:
Building DAG of jobs...
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
jobid: 3
wildcards: nico=epcr1
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
jobid: 4
wildcards: nico=epcr2
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
jobid: 1
wildcards: nico=wt1
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
jobid: 2
wildcards: nico=wt2
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 5
wildcards: nico=wt1
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt1_1.paired.fq.gz -2 02_clean/wt1_2.paired.fq.gz -S 03_align/wt1.sam >logs/map/wt1.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
output: 03_align/epcr2.sam
log: logs/map/epcr2.log
jobid: 8
wildcards: nico=epcr2
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr2_1.paired.fq.gz -2 02_clean/epcr2_2.paired.fq.gz -S 03_align/epcr2.sam >logs/map/epcr2.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 6
wildcards: nico=wt2
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt2_1.paired.fq.gz -2 02_clean/wt2_2.paired.fq.gz -S 03_align/wt2.sam >logs/map/wt2.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 7
wildcards: nico=epcr1
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr1_1.paired.fq.gz -2 02_clean/epcr1_2.paired.fq.gz -S 03_align/epcr1.sam >logs/map/epcr1.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/epcr1.sam
output: 03_align/epcr1.bam
jobid: 19
wildcards: nico=epcr1
samtools sort -@ 20 -m 10G -o 03_align/epcr1.bam 03_align/epcr1.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/epcr2.sam
output: 03_align/epcr2.bam
jobid: 20
wildcards: nico=epcr2
samtools sort -@ 20 -m 10G -o 03_align/epcr2.bam 03_align/epcr2.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 17
wildcards: nico=wt1
samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/wt2.sam
output: 03_align/wt2.bam
jobid: 18
wildcards: nico=wt2
samtools sort -@ 20 -m 10G -o 03_align/wt2.bam 03_align/wt2.sam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/wt2.bam
output: 04_exp/wt2_count.txt
jobid: 10
wildcards: nico=wt2
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt2_count.txt 03_align/wt2.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/wt1.bam
output: 04_exp/wt1_count.txt
jobid: 9
wildcards: nico=wt1
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt1_count.txt 03_align/wt1.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/epcr2.bam
output: 04_exp/epcr2_count.txt
jobid: 12
wildcards: nico=epcr2
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr2_count.txt 03_align/epcr2.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/wt1.bam
output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
jobid: 13
wildcards: nico=wt1
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt1_gene.gtf -o 05_ft/wt1_transcript.gtf 03_align/wt1.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/epcr2.bam
output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
jobid: 16
wildcards: nico=epcr2
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr2_gene.gtf -o 05_ft/epcr2_transcript.gtf 03_align/epcr2.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/wt2.bam
output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
jobid: 14
wildcards: nico=wt2
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt2_gene.gtf -o 05_ft/wt2_transcript.gtf 03_align/wt2.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/epcr1.bam
output: 04_exp/epcr1_count.txt
jobid: 11
wildcards: nico=epcr1
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr1_count.txt 03_align/epcr1.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/epcr1.bam
output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
jobid: 15
wildcards: nico=epcr1
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr1_gene.gtf -o 05_ft/epcr1_transcript.gtf 03_align/epcr1.bam
[Tue Apr 30 03:09:28 2019]
localrule all:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 03_align/wt1.sam, 03_align/wt2.sam, 03_align/epcr1.sam, 03_align/epcr2.sam, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
jobid: 0
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
但是当我真正执行它时,它在 运行ning 规则 sort2bam:
时报告错误
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
[Tue Apr 30 03:11:57 2019]
rule trim:
input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
jobid: 3
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:11:59 2019]
Finished job 3.
1 of 21 steps (5%) done
[Tue Apr 30 03:11:59 2019]
rule trim:
input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
jobid: 1
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:00 2019]
Finished job 1.
2 of 21 steps (10%) done
[Tue Apr 30 03:12:00 2019]
rule trim:
input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
jobid: 2
wildcards: nico=wt2
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:03 2019]
Finished job 2.
3 of 21 steps (14%) done
[Tue Apr 30 03:12:03 2019]
rule trim:
input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
jobid: 4
wildcards: nico=epcr2
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:04 2019]
Finished job 4.
4 of 21 steps (19%) done
[Tue Apr 30 03:12:04 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 6
wildcards: nico=wt2
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:06 2019]
Finished job 6.
5 of 21 steps (24%) done
[Tue Apr 30 03:12:06 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 7
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:07 2019]
Finished job 7.
6 of 21 steps (29%) done
[Tue Apr 30 03:12:07 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 5
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:09 2019]
Finished job 5.
7 of 21 steps (33%) done
[Tue Apr 30 03:12:09 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 17
wildcards: nico=wt1
[W::sam_read1] Parse error at line 64601
samtools sort: truncated file. Aborting
[Tue Apr 30 03:12:09 2019]
Error in rule sort2bam:
jobid: 17
output: 03_align/wt1.bam
RuleException:
CalledProcessError in line 45 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Command 'set -euo pipefail; samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam' returned non-zero exit status 1.
File "/root/s/r/snakemake/my_rnaseq_data/Snakefile", line 45, in __rule_sort2bam
File "/root/miniconda2/envs/py3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /root/s/r/snakemake/my_rnaseq_data/.snakemake/log/2019-04-30T031153.994462.snakemake.log
之后,我停止所有运行ning任务,查看文件夹,发现它新生成了几个文件,如下所示:
|-- 02_clean
| |-- epcr1_1.paired.fq.gz
| |-- epcr1_1.unpaired.fq.gz
| |-- epcr1_2.paired.fq.gz
| |-- epcr1_2.unpaired.fq.gz
| |-- epcr2_1.paired.fq.gz
| |-- epcr2_1.unpaired.fq.gz
| |-- epcr2_2.paired.fq.gz
| |-- epcr2_2.unpaired.fq.gz
| |-- wt1_1.paired.fq.gz
| |-- wt1_1.unpaired.fq.gz
| |-- wt1_2.paired.fq.gz
| |-- wt1_2.unpaired.fq.gz
| |-- wt2_1.paired.fq.gz
| |-- wt2_1.unpaired.fq.gz
| |-- wt2_2.paired.fq.gz
| `-- wt2_2.unpaired.fq.gz
|-- 03_align
| |-- epcr1.sam
| |-- wt1.sam
| `-- wt2.sam
但这些文件不完整。
所以这让我对 execution.Is 的顺序感到困惑,它 运行ning 平行于每个样本的五个规则?或者只是 运行ning 所有样本按规则进行,我的管道的 运行ning 过程似乎支持前一个。这也解释了错误:"samtools sort: truncated file. Aborting" 在 sam2bam 阶段。不知道我的猜测对不对。
但是我已经在我的 Snakefile 中添加了规则顺序:
ruleorder: trim > map > sort2bam > fpkm > count
但是好像不行!有没有其他选项或设置可以控制规则的执行顺序?
昨晚我 运行 snakefile 从 "rule map" 开始,基于 trimmed fastq.gz trimmed 使用相同的 Snakefile .而且运行宁好!整个运行ning过程如下:
lding DAG of jobs...
viUsing shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
17
[Mon Apr 29 14:37:39 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 19
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:40 2019]
Finished job 19.
1 of 17 steps (6%) done
[Mon Apr 29 14:37:40 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 17
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:48 2019]
Finished job 17.
2 of 17 steps (12%) done
[Mon Apr 29 14:37:51 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 18
wildcards: nico=wt2
[Mon Apr 29 14:37:55 2019]
Finished job 18.
3 of 17 steps (18%) done
[Mon Apr 29 14:37:57 2019]
rule map:
input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
output: 03_align/epcr2.sam
log: logs/map/epcr2.log
jobid: 20
wildcards: nico=epcr2
[Mon Apr 29 14:38:02 2019]
Finished job 20.
4 of 17 steps (24%) done
[Mon Apr 29 14:38:04 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 13
wildcards: nico=wt1
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 14:39:45 2019]
Finished job 13.
5 of 17 steps (29%) done
[Mon Apr 29 14:39:46 2019]
rule fpkm:
input: 03_align/wt1.bam
output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
jobid: 9
wildcards: nico=wt1
[Mon Apr 29 14:40:42 2019]
Finished job 9.
6 of 17 steps (35%) done
[Mon Apr 29 14:40:42 2019]
rule count:
input: 03_align/wt1.bam
output: 04_exp/wt1_count.txt
jobid: 5
wildcards: nico=wt1
[Mon Apr 29 14:41:40 2019]
Finished job 5.
7 of 17 steps (41%) done
[Mon Apr 29 14:41:40 2019]
rule sort2bam:
input: 03_align/epcr2.sam
output: 03_align/epcr2.bam
jobid: 16
wildcards: nico=epcr2
[Mon Apr 29 14:56:41 2019]
Finished job 16.
8 of 17 steps (47%) done
[Mon Apr 29 14:56:41 2019]
rule count:
input: 03_align/epcr2.bam
output: 04_exp/epcr2_count.txt
jobid: 8
wildcards: nico=epcr2
[Mon Apr 29 14:57:45 2019]
Finished job 8.
9 of 17 steps (53%) done
[Mon Apr 29 14:57:45 2019]
rule fpkm:
input: 03_align/epcr2.bam
output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
jobid: 12
wildcards: nico=epcr2
h[Mon Apr 29 15:01:32 2019]
Finished job 12.
10 of 17 steps (59%) done
[Mon Apr 29 15:01:32 2019]
rule sort2bam:
input: 03_align/wt2.sam
output: 03_align/wt2.bam
jobid: 14
wildcards: nico=wt2
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:08:05 2019]
Finished job 14.
11 of 17 steps (65%) done
[Mon Apr 29 15:08:05 2019]
rule fpkm:
input: 03_align/wt2.bam
output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
jobid: 10
wildcards: nico=wt2
[Mon Apr 29 15:12:28 2019]
Finished job 10.
12 of 17 steps (71%) done
[Mon Apr 29 15:12:28 2019]
rule count:
input: 03_align/wt2.bam
output: 04_exp/wt2_count.txt
jobid: 6
wildcards: nico=wt2
[Mon Apr 29 15:13:18 2019]
Finished job 6.
13 of 17 steps (76%) done
[Mon Apr 29 15:13:18 2019]
rule sort2bam:
input: 03_align/epcr1.sam
output: 03_align/epcr1.bam
jobid: 15
wildcards: nico=epcr1
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:21:35 2019]
Finished job 15.
14 of 17 steps (82%) done
[Mon Apr 29 15:21:35 2019]
rule fpkm:
input: 03_align/epcr1.bam
output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
jobid: 11
wildcards: nico=epcr1
[Mon Apr 29 15:27:22 2019]
Finished job 11.
15 of 17 steps (88%) done
[Mon Apr 29 15:27:22 2019]
rule count:
input: 03_align/epcr1.bam
output: 04_exp/epcr1_count.txt
jobid: 7
wildcards: nico=epcr1
[Mon Apr 29 15:28:32 2019]
Finished job 7.
16 of 17 steps (94%) done
[Mon Apr 29 15:28:32 2019]
localrule all:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
jobid: 0
[Mon Apr 29 15:28:32 2019]
Finished job 0.
17 of 17 steps (100%) done
rule的执行顺序是这样的:
起初,规则映射是完全executed.Then,对每个sample.bam执行左边的规则。
为什么它的顺序与从 raw_data 开始的整个管道不同?
总结:
两个问题:
1. 是我的 Snakefile 还是执行顺序出错了?
2.如何编辑我的Snakefile来设置每条规则的顺序以逐条执行任务?
如果有人帮助,我将不胜感激!
问题是 trim
和 map
规则中的 shell 命令以 &
结尾,它将工作进程发送到后台,主进程移动打开并退出。这向 Snakemake 发出作业已完成执行的信号。 Snakemake 然后继续检查声明为作业结果的输出文件,但找不到它们(因此,您看到的错误)。
如果您删除 &
并将进程保留在前台,看起来一切都会正常。
我已经使用 Snakemake 编写了一个星期的 RNA-seq 流水线。我仍然不知道 snakemake 的工作 order.The 版本是 5.4.4
我的RNA-seq管道由五部分组成,所以我写了五个规则(规则trim,规则alignemnt,规则sort_to_bam,规则fpkm,规则计数)。当我写一个规则,我将通过 运行 测试它。最后我完成了 it.And 它 运行 当我通过 step.Here 测试每个规则步骤时,我的 Snakefile:
SBT=["wt1","wt2","epcr1","epcr2"]
ruleorder: trim > map > sort2bam > fpkm > count
rule all:
input:
expand("02_clean/{nico}_1.paired.fq.gz", nico=SBT),
expand("02_clean/{nico}_2.paired.fq.gz", nico=SBT),
expand("03_align/{nico}.sam", nico=SBT),
expand("04_exp/{nico}_count.txt", nico=SBT),
expand("05_ft/{nico}_gene.gtf", nico=SBT),
expand("05_ft/{nico}_transcript.gtf", nico=SBT)
rule trim:
input:
"01_raw/{nico}_1.fastq",
"01_raw/{nico}_2.fastq"
output:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_1.unpaired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz",
"02_clean/{nico}_2.unpaired.fq.gz",
threads: 20
shell:
"java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"
rule map:
input:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz"
output:
"03_align/{nico}.sam"
log:
"logs/map/{nico}.log"
threads: 40
shell:
"hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"
rule sort2bam:
input:
"03_align/{nico}.sam"
output:
"03_align/{nico}.bam"
threads:20
shell:
"samtools sort -@ 20 -m 10G -o {output} {input}"
rule count:
input:
"03_align/{nico}.bam"
output:
"04_exp/{nico}_count.txt"
shell:
"featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"
rule fpkm:
input:
"03_align/{nico}.bam"
output:
"05_ft/{nico}_gene.gtf",
"05_ft/{nico}_transcript.gtf"
threads: 40
shell:
"stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A {output[0]} -o {output[1]} {input}"
raw_data显示如下:
(py3) root@SBT:~/s/r/snakemake/my_rnaseq_data/01_raw# tree
.
|-- epcr1_1.fastq
|-- epcr1_2.fastq
|-- epcr2_1.fastq
|-- epcr2_2.fastq
|-- wt1_1.fastq
|-- wt1_2.fastq
|-- wt2_1.fastq
`-- wt2_2.fastq
然后我想从 raw_data 测试管道,删除所有存在的中间文件,我之前的测试步骤 step.Here 是我的 dry_run 结果:
Building DAG of jobs...
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
jobid: 3
wildcards: nico=epcr1
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
jobid: 4
wildcards: nico=epcr2
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
jobid: 1
wildcards: nico=wt1
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
jobid: 2
wildcards: nico=wt2
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 5
wildcards: nico=wt1
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt1_1.paired.fq.gz -2 02_clean/wt1_2.paired.fq.gz -S 03_align/wt1.sam >logs/map/wt1.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
output: 03_align/epcr2.sam
log: logs/map/epcr2.log
jobid: 8
wildcards: nico=epcr2
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr2_1.paired.fq.gz -2 02_clean/epcr2_2.paired.fq.gz -S 03_align/epcr2.sam >logs/map/epcr2.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 6
wildcards: nico=wt2
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt2_1.paired.fq.gz -2 02_clean/wt2_2.paired.fq.gz -S 03_align/wt2.sam >logs/map/wt2.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 7
wildcards: nico=epcr1
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr1_1.paired.fq.gz -2 02_clean/epcr1_2.paired.fq.gz -S 03_align/epcr1.sam >logs/map/epcr1.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/epcr1.sam
output: 03_align/epcr1.bam
jobid: 19
wildcards: nico=epcr1
samtools sort -@ 20 -m 10G -o 03_align/epcr1.bam 03_align/epcr1.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/epcr2.sam
output: 03_align/epcr2.bam
jobid: 20
wildcards: nico=epcr2
samtools sort -@ 20 -m 10G -o 03_align/epcr2.bam 03_align/epcr2.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 17
wildcards: nico=wt1
samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/wt2.sam
output: 03_align/wt2.bam
jobid: 18
wildcards: nico=wt2
samtools sort -@ 20 -m 10G -o 03_align/wt2.bam 03_align/wt2.sam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/wt2.bam
output: 04_exp/wt2_count.txt
jobid: 10
wildcards: nico=wt2
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt2_count.txt 03_align/wt2.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/wt1.bam
output: 04_exp/wt1_count.txt
jobid: 9
wildcards: nico=wt1
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt1_count.txt 03_align/wt1.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/epcr2.bam
output: 04_exp/epcr2_count.txt
jobid: 12
wildcards: nico=epcr2
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr2_count.txt 03_align/epcr2.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/wt1.bam
output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
jobid: 13
wildcards: nico=wt1
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt1_gene.gtf -o 05_ft/wt1_transcript.gtf 03_align/wt1.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/epcr2.bam
output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
jobid: 16
wildcards: nico=epcr2
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr2_gene.gtf -o 05_ft/epcr2_transcript.gtf 03_align/epcr2.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/wt2.bam
output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
jobid: 14
wildcards: nico=wt2
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt2_gene.gtf -o 05_ft/wt2_transcript.gtf 03_align/wt2.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/epcr1.bam
output: 04_exp/epcr1_count.txt
jobid: 11
wildcards: nico=epcr1
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr1_count.txt 03_align/epcr1.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/epcr1.bam
output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
jobid: 15
wildcards: nico=epcr1
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr1_gene.gtf -o 05_ft/epcr1_transcript.gtf 03_align/epcr1.bam
[Tue Apr 30 03:09:28 2019]
localrule all:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 03_align/wt1.sam, 03_align/wt2.sam, 03_align/epcr1.sam, 03_align/epcr2.sam, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
jobid: 0
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
但是当我真正执行它时,它在 运行ning 规则 sort2bam:
时报告错误Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
[Tue Apr 30 03:11:57 2019]
rule trim:
input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
jobid: 3
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:11:59 2019]
Finished job 3.
1 of 21 steps (5%) done
[Tue Apr 30 03:11:59 2019]
rule trim:
input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
jobid: 1
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:00 2019]
Finished job 1.
2 of 21 steps (10%) done
[Tue Apr 30 03:12:00 2019]
rule trim:
input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
jobid: 2
wildcards: nico=wt2
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:03 2019]
Finished job 2.
3 of 21 steps (14%) done
[Tue Apr 30 03:12:03 2019]
rule trim:
input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
jobid: 4
wildcards: nico=epcr2
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:04 2019]
Finished job 4.
4 of 21 steps (19%) done
[Tue Apr 30 03:12:04 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 6
wildcards: nico=wt2
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:06 2019]
Finished job 6.
5 of 21 steps (24%) done
[Tue Apr 30 03:12:06 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 7
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:07 2019]
Finished job 7.
6 of 21 steps (29%) done
[Tue Apr 30 03:12:07 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 5
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:09 2019]
Finished job 5.
7 of 21 steps (33%) done
[Tue Apr 30 03:12:09 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 17
wildcards: nico=wt1
[W::sam_read1] Parse error at line 64601
samtools sort: truncated file. Aborting
[Tue Apr 30 03:12:09 2019]
Error in rule sort2bam:
jobid: 17
output: 03_align/wt1.bam
RuleException:
CalledProcessError in line 45 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Command 'set -euo pipefail; samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam' returned non-zero exit status 1.
File "/root/s/r/snakemake/my_rnaseq_data/Snakefile", line 45, in __rule_sort2bam
File "/root/miniconda2/envs/py3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /root/s/r/snakemake/my_rnaseq_data/.snakemake/log/2019-04-30T031153.994462.snakemake.log
之后,我停止所有运行ning任务,查看文件夹,发现它新生成了几个文件,如下所示:
|-- 02_clean
| |-- epcr1_1.paired.fq.gz
| |-- epcr1_1.unpaired.fq.gz
| |-- epcr1_2.paired.fq.gz
| |-- epcr1_2.unpaired.fq.gz
| |-- epcr2_1.paired.fq.gz
| |-- epcr2_1.unpaired.fq.gz
| |-- epcr2_2.paired.fq.gz
| |-- epcr2_2.unpaired.fq.gz
| |-- wt1_1.paired.fq.gz
| |-- wt1_1.unpaired.fq.gz
| |-- wt1_2.paired.fq.gz
| |-- wt1_2.unpaired.fq.gz
| |-- wt2_1.paired.fq.gz
| |-- wt2_1.unpaired.fq.gz
| |-- wt2_2.paired.fq.gz
| `-- wt2_2.unpaired.fq.gz
|-- 03_align
| |-- epcr1.sam
| |-- wt1.sam
| `-- wt2.sam
但这些文件不完整。 所以这让我对 execution.Is 的顺序感到困惑,它 运行ning 平行于每个样本的五个规则?或者只是 运行ning 所有样本按规则进行,我的管道的 运行ning 过程似乎支持前一个。这也解释了错误:"samtools sort: truncated file. Aborting" 在 sam2bam 阶段。不知道我的猜测对不对。
但是我已经在我的 Snakefile 中添加了规则顺序:
ruleorder: trim > map > sort2bam > fpkm > count
但是好像不行!有没有其他选项或设置可以控制规则的执行顺序?
昨晚我 运行 snakefile 从 "rule map" 开始,基于 trimmed fastq.gz trimmed 使用相同的 Snakefile .而且运行宁好!整个运行ning过程如下:
lding DAG of jobs...
viUsing shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
17
[Mon Apr 29 14:37:39 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 19
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:40 2019]
Finished job 19.
1 of 17 steps (6%) done
[Mon Apr 29 14:37:40 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 17
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:48 2019]
Finished job 17.
2 of 17 steps (12%) done
[Mon Apr 29 14:37:51 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 18
wildcards: nico=wt2
[Mon Apr 29 14:37:55 2019]
Finished job 18.
3 of 17 steps (18%) done
[Mon Apr 29 14:37:57 2019]
rule map:
input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
output: 03_align/epcr2.sam
log: logs/map/epcr2.log
jobid: 20
wildcards: nico=epcr2
[Mon Apr 29 14:38:02 2019]
Finished job 20.
4 of 17 steps (24%) done
[Mon Apr 29 14:38:04 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 13
wildcards: nico=wt1
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 14:39:45 2019]
Finished job 13.
5 of 17 steps (29%) done
[Mon Apr 29 14:39:46 2019]
rule fpkm:
input: 03_align/wt1.bam
output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
jobid: 9
wildcards: nico=wt1
[Mon Apr 29 14:40:42 2019]
Finished job 9.
6 of 17 steps (35%) done
[Mon Apr 29 14:40:42 2019]
rule count:
input: 03_align/wt1.bam
output: 04_exp/wt1_count.txt
jobid: 5
wildcards: nico=wt1
[Mon Apr 29 14:41:40 2019]
Finished job 5.
7 of 17 steps (41%) done
[Mon Apr 29 14:41:40 2019]
rule sort2bam:
input: 03_align/epcr2.sam
output: 03_align/epcr2.bam
jobid: 16
wildcards: nico=epcr2
[Mon Apr 29 14:56:41 2019]
Finished job 16.
8 of 17 steps (47%) done
[Mon Apr 29 14:56:41 2019]
rule count:
input: 03_align/epcr2.bam
output: 04_exp/epcr2_count.txt
jobid: 8
wildcards: nico=epcr2
[Mon Apr 29 14:57:45 2019]
Finished job 8.
9 of 17 steps (53%) done
[Mon Apr 29 14:57:45 2019]
rule fpkm:
input: 03_align/epcr2.bam
output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
jobid: 12
wildcards: nico=epcr2
h[Mon Apr 29 15:01:32 2019]
Finished job 12.
10 of 17 steps (59%) done
[Mon Apr 29 15:01:32 2019]
rule sort2bam:
input: 03_align/wt2.sam
output: 03_align/wt2.bam
jobid: 14
wildcards: nico=wt2
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:08:05 2019]
Finished job 14.
11 of 17 steps (65%) done
[Mon Apr 29 15:08:05 2019]
rule fpkm:
input: 03_align/wt2.bam
output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
jobid: 10
wildcards: nico=wt2
[Mon Apr 29 15:12:28 2019]
Finished job 10.
12 of 17 steps (71%) done
[Mon Apr 29 15:12:28 2019]
rule count:
input: 03_align/wt2.bam
output: 04_exp/wt2_count.txt
jobid: 6
wildcards: nico=wt2
[Mon Apr 29 15:13:18 2019]
Finished job 6.
13 of 17 steps (76%) done
[Mon Apr 29 15:13:18 2019]
rule sort2bam:
input: 03_align/epcr1.sam
output: 03_align/epcr1.bam
jobid: 15
wildcards: nico=epcr1
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:21:35 2019]
Finished job 15.
14 of 17 steps (82%) done
[Mon Apr 29 15:21:35 2019]
rule fpkm:
input: 03_align/epcr1.bam
output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
jobid: 11
wildcards: nico=epcr1
[Mon Apr 29 15:27:22 2019]
Finished job 11.
15 of 17 steps (88%) done
[Mon Apr 29 15:27:22 2019]
rule count:
input: 03_align/epcr1.bam
output: 04_exp/epcr1_count.txt
jobid: 7
wildcards: nico=epcr1
[Mon Apr 29 15:28:32 2019]
Finished job 7.
16 of 17 steps (94%) done
[Mon Apr 29 15:28:32 2019]
localrule all:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
jobid: 0
[Mon Apr 29 15:28:32 2019]
Finished job 0.
17 of 17 steps (100%) done
rule的执行顺序是这样的: 起初,规则映射是完全executed.Then,对每个sample.bam执行左边的规则。
为什么它的顺序与从 raw_data 开始的整个管道不同?
总结: 两个问题: 1. 是我的 Snakefile 还是执行顺序出错了? 2.如何编辑我的Snakefile来设置每条规则的顺序以逐条执行任务?
如果有人帮助,我将不胜感激!
问题是 trim
和 map
规则中的 shell 命令以 &
结尾,它将工作进程发送到后台,主进程移动打开并退出。这向 Snakemake 发出作业已完成执行的信号。 Snakemake 然后继续检查声明为作业结果的输出文件,但找不到它们(因此,您看到的错误)。
如果您删除 &
并将进程保留在前台,看起来一切都会正常。