未使用 awk 写入特定路径的输出文件进入下一个流程

Output file not written in specific path using awk into a nextflow process

我想选择AWK获得的不同输出的路径进入nextflow进程,但我无法获取。通过在查看后写入 $it,我可以获得工作目录中的输出,但我需要选择 $PATH。我尝试更改 $it to "${PathOutdir}/test_out.csv" 但没有用。在这里,我在 nextflow 进程中放置了一个简单的 awk 函数。我应该使用工作流功能吗?提前致谢!

PathFile = "/home/pvellosillo/nextflow_test/test.csv"
InputCsv = file(PathFile)
PathOutdir = "/home/pvellosillo/nextflow_test"

process genesFilter {
tag "PathInputFile:${PathFile}"
input:
   path InputCsv
output:
  file("test_out.csv") into out_filter

shell:

"""
#!/bin/bash
awk 'BEGIN{FS=OFS="\t"}{print $2}' $InputCsv > "test_out.csv"
"""
}

out_filter.view {"${PathOutdir}/test_out.csv"}

根据您上面的问题和评论:

Note that by using publishDir {$PathOutdir} i get an output in a chosen directory but the files are symbolic links to the work directory instead of simply files

我认为您需要 'copy' 模式,以便可以将声明的输出文件发布到 publishDir。确保避免访问 publishDir:

中的输出文件

Files are copied into the specified directory in an asynchronous manner, so they may not be immediately available in the published directory at the end of the process execution. For this reason, downstream processes should not try to access output files through the publish directory, but through channels.

params.pathFile = "/home/pvellosillo/nextflow_test/test.csv"
params.publishDir = "/home/pvellosillo/nextflow_test"

InputCsv = file( params.pathFile )


process genesFilter {

    tag { InputCsv.name }

    publishDir(
        path: "${params.publishDir}/genesFilter",
        mode: 'copy',
    )

    input:
    path InputCsv

    output:
    path "test_out.csv" into out_filter

    shell:
    '''
    awk 'BEGIN { FS=OFS="\t" } { print  }' "!{InputCsv}" > "test_out.csv"
    '''
}

out_filter.view()

另请注意,shell 脚本定义需要使用 single-quote ' 分隔字符串。