如何将来自两个不同列表的输出传递到下一个流程中的连续过程

Question

假设我有两个进程。

Channel
    .fromFilePairs("${params.dir}/{SPB_50k_exome_seq,FE_50k_exome_seq}.{bed,bim,fam}",size:3) {
        file -> file.baseName
    }
    .filter { key, files -> key in params.pops }
    .set { plink_data }

process pling_1 {
    publishDir "${params.outputDir}/filtered"

    input:
    set pop, file(pl_files) from plink_data

    output:
    file "${pop}_filtered.{bed,fam,bim}" into pling1_results

    script:
    output_file = "${pop}_filtered"
    base        = pl_files[0].baseName

     """
        plink2 \
        --bfile $pop \
        --hwe 0.00001 \
        --make-bed \
        --out ${output_file} \
     """
}
process pling_2 {
    publishDir "${params.outputDir}/filtered_vcf"

    input:
    set file(bed), file(bim), file(fam) from pling1_results.collect()
    file(fam1) from fam_for_plink2

    output:
    file("${base}.vcf.gz") into pling2_results

    script:

    base          = bed.baseName
    output_file   = "${base}"

     """
     plink2 \
     --bfile $base \
     --keep-fam ${params.fam}/50k_exome_seq_filtered_for_VEP_ID.txt \
     --recode vcf-iid bgz --out ${output_file}
     """
}

pling_1过程的结果是两个元素列表，

[/work/SPB_50k_exome_seq.bed, /work/SPB_50k_exome_seq.bim,/work/SPB_50k_exome_seq.fam]
[/work/FE_50k_exome_seq.bed, /work/FE_50k_exome_seq.bim,/work/FE_50k_exome_seq.fam]

因此，在ping_2不是我无法一次性处理SPB_50k_exome_seq和FE_50k_exome_seq。 base = bed.baseName 仅采用 SPB_50k_exome_seq 并从第二个列表中省略 FE_50k_exome_seq。在这种情况下，如何将 SPB_50k_exome_seq 和 FE_50k_exome_seq 都传递给 pling_2 进程？

非常感谢任何帮助或建议。

谢谢

Answer 1

经过大量实验，我找到了解决办法。会帮助正在寻找解决方案的人。

问题的原因是进程pling_1产生了如此多的文件作为输出，所有这些文件都输出到同一个通道。因此，您需要将这些文件分解并 re-group 为格式的元组。为此，我使用了以下频道，您可以在其中使用 combine & flatten.

等运算符

pling1_results
    .collect()
   .flatten()
    .map { file -> tuple(file.baseName, file)}
    .groupTuple(by: 0)
    .map { input -> tuple(input[0], input[1][0], input[1][1], input[1][2])}
    .set { pl1 }

然后使用通道pl1作为下一个过程的输入。除此之外，您应该将所有项目作为单个输入通过通道传递到流程中。像下面这样，

set val(pop1),file(bed), file(bim), file(fam),file(fam1) from pl1.combine(fam_for_plink2)

通过这两项更改，现在的工作流程是运行每个 sample/input 而 tuple.instead 仅一个。谢谢

如何将来自两个不同列表的输出传递到下一个流程中的连续过程

How to pass outputs from two different lists to consecutive process in next flow

groovy

nextflow