在 nextflow 中合并的输出文件(染色体块)
output files (chromosomal chunks) merging in nextflow
我有一个 nextflow 进程,它为每个染色体生成多个块到一个通道中,比如说,imputation
看起来像,
chr1.imputed.chunk1.gen.gz chr1.imputed.chunk2.gen.gz chr1.imputed.chunk3.gen.gz
chr1.imputed.chunk1.stats chr1.imputed.chunk2.stats chr1.imputed.chunk3.stats
chr1.imputed.chunk1.bgen chr1.imputed.chunk2.bgen chr1.imputed.chunk3.bgen
.....
每条染色体(22条染色体)有很多块。我怎样才能有效地合并它们
为每种类型的文件集获取各自的染色体,
chr1.imputed.merged.gen.gz
chr1.imputed.merged.stats
chr1.imputed.merged.bgen
得到合并输出后,我想删除所有的块。有帮助吗?
生成这些块的实际代码是:
process imputation {
publishDir params.out, mode:'copy'
input:
tuple val(chrom),val(chunk_array),val(chunk_start),val(chunk_end),path(in_haps),path(refs),path(maps) from imp_ch
output:
tuple val("${chrom}"),path("${chrom}.*") into imputed
script:
def (haps,sample)=in_haps
def (haplotype, legend, samples)=refs
"""
impute4.1.2_r300.3 -g "${haps}" -h "${haplotype}" -l "${legend}" -m "${maps}" -o "${chrom}.step10.imputed.chunk${chunk_array}" -no_maf_align -o_gz -int "${chunk_start}" "${chunk_end}" -Ne 20000 -buffer 1000 -seed 54321
if [[ $(gunzip -c "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c) == "0" ]]
then
echo "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" is empty
else
qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -snp-stats -osnp "${chrom}.step10.imputed.chunk${chunk_array}.snp.stats"
qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -og "${chrom}.step10.imputed.chunk${chunk_array}.bgen" -os "${chrom}.step10.imputed.chunk${chunk_array}.sample"
fi
"""
您可以 post 生成您显示的片段的实际代码吗
不看你的代码,我建议你试试这个http://nextflow-io.github.io/patterns/index.html#_process_per_file_range
你有这个
output:
tuple val("${chrom}"),path("${chrom}.*") into imputed
使用之前的输出通道规范,您可能必须在下游 process
中执行类似的操作
input:
tuple val(name), path(chr_files) from imputed
script:
gen_files = chr_files.findAll { it.toString().endsWith('.gen.gz') }.sort()
stat_files = chr_files.findAll { it.toString().endsWith('.stats') }.sort()
"""
# try with echo first to see if you get what you want
echo ${gen_files.join(' ')} > ${name}_gen_fileList.txt
echo ${stat_files.join(' ')} > ${name}_stat_fileList.txt
"""
一旦您确定上面的 echo
正在按照您的预期打印,那么您可以在 process
块
中做其他事情
显然以下几行代码解决了这个问题。
imputed.into{impute_bgen;impute_gen;impute_sample;impute_stat}
bgens=impute_bgen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[0])}.groupTuple()
gens=impute_gen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[1])}.groupTuple()
samples=impute_sample.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[2])}.groupTuple()
stats=impute_stat.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[3])}.groupTuple()
我有一个 nextflow 进程,它为每个染色体生成多个块到一个通道中,比如说,imputation
看起来像,
chr1.imputed.chunk1.gen.gz chr1.imputed.chunk2.gen.gz chr1.imputed.chunk3.gen.gz
chr1.imputed.chunk1.stats chr1.imputed.chunk2.stats chr1.imputed.chunk3.stats
chr1.imputed.chunk1.bgen chr1.imputed.chunk2.bgen chr1.imputed.chunk3.bgen
.....
每条染色体(22条染色体)有很多块。我怎样才能有效地合并它们 为每种类型的文件集获取各自的染色体,
chr1.imputed.merged.gen.gz
chr1.imputed.merged.stats
chr1.imputed.merged.bgen
得到合并输出后,我想删除所有的块。有帮助吗?
生成这些块的实际代码是:
process imputation {
publishDir params.out, mode:'copy'
input:
tuple val(chrom),val(chunk_array),val(chunk_start),val(chunk_end),path(in_haps),path(refs),path(maps) from imp_ch
output:
tuple val("${chrom}"),path("${chrom}.*") into imputed
script:
def (haps,sample)=in_haps
def (haplotype, legend, samples)=refs
"""
impute4.1.2_r300.3 -g "${haps}" -h "${haplotype}" -l "${legend}" -m "${maps}" -o "${chrom}.step10.imputed.chunk${chunk_array}" -no_maf_align -o_gz -int "${chunk_start}" "${chunk_end}" -Ne 20000 -buffer 1000 -seed 54321
if [[ $(gunzip -c "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c) == "0" ]]
then
echo "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" is empty
else
qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -snp-stats -osnp "${chrom}.step10.imputed.chunk${chunk_array}.snp.stats"
qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -og "${chrom}.step10.imputed.chunk${chunk_array}.bgen" -os "${chrom}.step10.imputed.chunk${chunk_array}.sample"
fi
"""
您可以 post 生成您显示的片段的实际代码吗
不看你的代码,我建议你试试这个http://nextflow-io.github.io/patterns/index.html#_process_per_file_range
你有这个
output:
tuple val("${chrom}"),path("${chrom}.*") into imputed
使用之前的输出通道规范,您可能必须在下游 process
input:
tuple val(name), path(chr_files) from imputed
script:
gen_files = chr_files.findAll { it.toString().endsWith('.gen.gz') }.sort()
stat_files = chr_files.findAll { it.toString().endsWith('.stats') }.sort()
"""
# try with echo first to see if you get what you want
echo ${gen_files.join(' ')} > ${name}_gen_fileList.txt
echo ${stat_files.join(' ')} > ${name}_stat_fileList.txt
"""
一旦您确定上面的 echo
正在按照您的预期打印,那么您可以在 process
块
显然以下几行代码解决了这个问题。
imputed.into{impute_bgen;impute_gen;impute_sample;impute_stat}
bgens=impute_bgen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[0])}.groupTuple()
gens=impute_gen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[1])}.groupTuple()
samples=impute_sample.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[2])}.groupTuple()
stats=impute_stat.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[3])}.groupTuple()