bwa mem 进程的问题不是 运行 上一个进程的所有输出文件
Issue with bwa mem process not running on all output files from previous process
我正在构建一个 nextflow 管道,通过测序 (GBS) 数据(单端 Illumina)来映射和变异调用基因分型。我的大部分内容都基于 nf-core/eager 管道,因为它有许多我想合并到我的管道中的工具。我已经在样本上测试了管道,它运行良好。然而,当我尝试 运行 在更多样本上使用管道时,它会很好地提取读取文件并使用 fastp 修剪它们。但是,当我尝试 运行 bwa mem 在修剪过的文件上时,它只会对一个修剪过的 fastq 文件起作用,它似乎是随机选择的,这意味着下游进程只 运行 在一个文件。我尝试了几种不同的方法,其中 none 似乎有效。我猜这可能与 fasta reference/bwa 索引不是价值渠道有关?有什么建议吗?
//read reference fasta channel
Channel.fromPath("${params.fasta}")
.ifEmpty { exit 1, "No genome specified! Please specify one with --fasta or --bwa_index"}
.into {ch_fasta_for_bwa_indexing; ch_fasta_for_faidx_indexing; ch_fasta_for_variant_call; ch_fasta_for_bwamem_mapping; ch_fasta_for_qualimap}
///build_bwa_index
process build_bwa_index {
tag {fasta}
publishDir path: "${params.outdir}/bwa_index", mode: 'copy', saveAs: { filename ->
if (params.saveReference) filename
else if(!params.saveReference && filename == "where_are_my_files.txt") filename
else null
}
when: !params.bwa_index && params.fasta
input:
file fasta from ch_fasta_for_bwa_indexing
file wherearemyfiles
output:
file "*.{amb,ann,bwt,pac,sa,fasta,fa}" into bwa_index_bwamem
file "where_are_my_files.txt"
"""
bwa index $fasta
"""
}
///bwa_align process
process bwa_align {
tag "$name"
publishDir "${params.outdir}/mapping/bwamem", mode: 'copy'
input:
set val(name), file(reads) from trimmed_fastq
file fasta from ch_fasta_for_bwamem_mapping
file "*" from bwa_index_bwamem
output:
file "*_sorted.bam" into bwa_sorted_bam_idxstats, bwa_sorted_bam_filter
file "*.bai"
script:
if(params.singleEnd){
"""
bwa mem $fasta ${reads[0]} -t ${task.cpus} | samtools sort -@ ${task.cpus} -o ${name}_sorted.bam
samtools index -@ ${task.cpus} ${name}_sorted.bam
"""
} else {
"""
bwa mem $fasta ${reads[0]} ${reads[1]} -t ${task.cpus} | samtools sort -@ ${task.cpus} -o ${name}_sorted.bam
samtools index -@ ${task.cpus} ${name}_sorted.bam
"""
}
}
我希望 bwa_align 进程对本例中 fastp 进程生成的两个文件 运行
Pipeline name : trishulagenetics/genocan
Pipeline version: 0.1dev
Run name : exotic_hoover
Reads : data_2/*.R{1,2}.fastq.gz
Fasta reference: GCA_000230575.4_ASM23057v4_genomic.fna
bwa index : false
Data type : Single-end
Max Memory : null
Max CPUs : null
Max Time : null
Output dir : ./results
Working dir : /home/debian/Trishula/SRR2060630_split/test/work
Container Engine: docker
Container : trishulagenetics/genocan:latest
Current home : /home/debian
Current user : debian
Current path : /home/debian/Trishula/SRR2060630_split/test
Script dir : /home/debian/.nextflow/assets/trishulagenetics/genocan
Config Profile : docker
=========================================
executor > local (14)
[b1/080d6a] process > get_software_versions [100%] 1 of 1 ✔
[4e/87b4c2] process > build_bwa_index (GCA_000230575.4_ASM23057v4_genomic.fna) [100%] 1 of 1 ✔
[27/64b776] process > build_fasta_index (GCA_000230575.4_ASM23057v4_genomic.fna) [100%] 1 of 1 ✔
[f6/b07508] process > fastqc (P2_E07_M_0055) [100%] 2 of 2 ✔
[87/ecd07c] process > fastp (P2_E07_M_0055) [100%] 2 of 2 ✔
[50/e7bf8c] process > bwa_align (P2_A01_M_0001) [100%] 1 of 1 ✔
[c1/3647bc] process > samtools_idxstats (P2_A01_M_0001_sorted) [100%] 1 of 1 ✔
[0c/68b22c] process > samtools_filter (P2_A01_M_0001_sorted) [100%] 1 of 1 ✔
[de/c26b2d] process > qualimap (P2_A01_M_0001_sorted.filtered) [100%] 1 of 1 ✔
[bc/f7cf86] process > variant_call (P2_A01_M_0001) [100%] 1 of 1 ✔
[6f/2a9ab8] process > multiqc [100%] 1 of 1 ✔
[bb/b8b957] process > output_documentation (null) [100%] 1 of 1 ✔
[trishulagenetics/genocan] Pipeline Complete
Completed at: 17-Aug-2019 09:51:48
Duration : 19m 34s
CPU hours : 0.3
Succeeded : 14
是 - 基本上最好避免将 fasta 文件拆分为多个通道,而只使用隐式为 value channel:
的单个值
ref_fasta = file(params.fasta)
process build_bwa_index {
storeDir ...
input:
file ref_fasta
output:
file "*.{amb,ann,bwt,pac,sa}" into bwa_index
"""
bwa index "${ref_fasta}"
"""
}
process bwa_mem {
publishDir ...
input:
set name, file(reads) from trimmed_fastq
file ref_fasta
file "*" from bwa_index
...
}
我正在构建一个 nextflow 管道,通过测序 (GBS) 数据(单端 Illumina)来映射和变异调用基因分型。我的大部分内容都基于 nf-core/eager 管道,因为它有许多我想合并到我的管道中的工具。我已经在样本上测试了管道,它运行良好。然而,当我尝试 运行 在更多样本上使用管道时,它会很好地提取读取文件并使用 fastp 修剪它们。但是,当我尝试 运行 bwa mem 在修剪过的文件上时,它只会对一个修剪过的 fastq 文件起作用,它似乎是随机选择的,这意味着下游进程只 运行 在一个文件。我尝试了几种不同的方法,其中 none 似乎有效。我猜这可能与 fasta reference/bwa 索引不是价值渠道有关?有什么建议吗?
//read reference fasta channel
Channel.fromPath("${params.fasta}")
.ifEmpty { exit 1, "No genome specified! Please specify one with --fasta or --bwa_index"}
.into {ch_fasta_for_bwa_indexing; ch_fasta_for_faidx_indexing; ch_fasta_for_variant_call; ch_fasta_for_bwamem_mapping; ch_fasta_for_qualimap}
///build_bwa_index
process build_bwa_index {
tag {fasta}
publishDir path: "${params.outdir}/bwa_index", mode: 'copy', saveAs: { filename ->
if (params.saveReference) filename
else if(!params.saveReference && filename == "where_are_my_files.txt") filename
else null
}
when: !params.bwa_index && params.fasta
input:
file fasta from ch_fasta_for_bwa_indexing
file wherearemyfiles
output:
file "*.{amb,ann,bwt,pac,sa,fasta,fa}" into bwa_index_bwamem
file "where_are_my_files.txt"
"""
bwa index $fasta
"""
}
///bwa_align process
process bwa_align {
tag "$name"
publishDir "${params.outdir}/mapping/bwamem", mode: 'copy'
input:
set val(name), file(reads) from trimmed_fastq
file fasta from ch_fasta_for_bwamem_mapping
file "*" from bwa_index_bwamem
output:
file "*_sorted.bam" into bwa_sorted_bam_idxstats, bwa_sorted_bam_filter
file "*.bai"
script:
if(params.singleEnd){
"""
bwa mem $fasta ${reads[0]} -t ${task.cpus} | samtools sort -@ ${task.cpus} -o ${name}_sorted.bam
samtools index -@ ${task.cpus} ${name}_sorted.bam
"""
} else {
"""
bwa mem $fasta ${reads[0]} ${reads[1]} -t ${task.cpus} | samtools sort -@ ${task.cpus} -o ${name}_sorted.bam
samtools index -@ ${task.cpus} ${name}_sorted.bam
"""
}
}
我希望 bwa_align 进程对本例中 fastp 进程生成的两个文件 运行
Pipeline name : trishulagenetics/genocan
Pipeline version: 0.1dev
Run name : exotic_hoover
Reads : data_2/*.R{1,2}.fastq.gz
Fasta reference: GCA_000230575.4_ASM23057v4_genomic.fna
bwa index : false
Data type : Single-end
Max Memory : null
Max CPUs : null
Max Time : null
Output dir : ./results
Working dir : /home/debian/Trishula/SRR2060630_split/test/work
Container Engine: docker
Container : trishulagenetics/genocan:latest
Current home : /home/debian
Current user : debian
Current path : /home/debian/Trishula/SRR2060630_split/test
Script dir : /home/debian/.nextflow/assets/trishulagenetics/genocan
Config Profile : docker
=========================================
executor > local (14)
[b1/080d6a] process > get_software_versions [100%] 1 of 1 ✔
[4e/87b4c2] process > build_bwa_index (GCA_000230575.4_ASM23057v4_genomic.fna) [100%] 1 of 1 ✔
[27/64b776] process > build_fasta_index (GCA_000230575.4_ASM23057v4_genomic.fna) [100%] 1 of 1 ✔
[f6/b07508] process > fastqc (P2_E07_M_0055) [100%] 2 of 2 ✔
[87/ecd07c] process > fastp (P2_E07_M_0055) [100%] 2 of 2 ✔
[50/e7bf8c] process > bwa_align (P2_A01_M_0001) [100%] 1 of 1 ✔
[c1/3647bc] process > samtools_idxstats (P2_A01_M_0001_sorted) [100%] 1 of 1 ✔
[0c/68b22c] process > samtools_filter (P2_A01_M_0001_sorted) [100%] 1 of 1 ✔
[de/c26b2d] process > qualimap (P2_A01_M_0001_sorted.filtered) [100%] 1 of 1 ✔
[bc/f7cf86] process > variant_call (P2_A01_M_0001) [100%] 1 of 1 ✔
[6f/2a9ab8] process > multiqc [100%] 1 of 1 ✔
[bb/b8b957] process > output_documentation (null) [100%] 1 of 1 ✔
[trishulagenetics/genocan] Pipeline Complete
Completed at: 17-Aug-2019 09:51:48
Duration : 19m 34s
CPU hours : 0.3
Succeeded : 14
是 - 基本上最好避免将 fasta 文件拆分为多个通道,而只使用隐式为 value channel:
的单个值ref_fasta = file(params.fasta)
process build_bwa_index {
storeDir ...
input:
file ref_fasta
output:
file "*.{amb,ann,bwt,pac,sa}" into bwa_index
"""
bwa index "${ref_fasta}"
"""
}
process bwa_mem {
publishDir ...
input:
set name, file(reads) from trimmed_fastq
file ref_fasta
file "*" from bwa_index
...
}