如何在 snakemake 管道中 运行 bash 脚本

How to run a bash script inside a snakemake pipeline

我想在 snakemake 管道中 运行 一个 bash 脚本。但是我不知道如何在bash脚本中调用snakemake的输入和输出。

贪吃蛇:

rule xxx:
    input:
        "input.vcf"
    output:
        "output.tab"
    shell:
        """
        some_bash.sh {input} {output}
        """

bash 脚本:

#!/bin/bash

paste <(bcftools snakemake@input[0] |\
    awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT\tFILTER"} \
      !/^#/ {print "\t""\t""\t""\t""\t"}') \
    \
  <(bcftools query -f '[\t%SAMPLE=%GT]\n' snakemake@input[0] |\
    awk 'BEGIN {print "nHet"} {print gsub(/0\|1|1\|0|0\/1|1\/0/, "")}') \
    \
  <(bcftools query -f '[\t%SAMPLE=%GT]\n' snakemake@input[0] |\
    awk 'BEGIN {print "nHomAlt"} {print gsub(/1\|1|1\/1/, "")}') \
    \
  <(bcftools query -f '[\t%SAMPLE=%GT]\n' snakemake@input[0] |\
    awk 'BEGIN {print "nHomRef"} {print gsub(/0\|0|0\/0/, "")}') \
    \
  <(bcftools snakemake@input[0] | awk -F"\t" '/^#CHROM/ {split([=13=], header, "\t"); print "HetSamples"} \
    !/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/0\|1|1\|0|0\/1|1\/0/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
    \
  <(bcftools snakemake@input[0] | awk -F"\t" '/^#CHROM/ {split([=13=], header, "\t"); print "HomSamplesAlt"} \
    !/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/1\|1|1\/1/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
    \
  | sed 's/,\t/\t/g' | sed 's/,$//g' > snakemake@output[0]

我得到的错误:

[E::main] unrecognized command 'snakemake@input[0]'
[E::main] unrecognized command 'snakemake@input[0]'
[E::main] unrecognized command 'snakemake@input[0]'
[E::hts_open_format] [E::hts_open_format] Failed to open file "snakemake@input[0]" : No such file or directoryFailed to open file "snakemake@input[0]" : No such file or directory

您需要使用 bash 语法获取输入参数,snakemake@input[0] 专门用于使用 script 指令的 R 脚本。

特别是,您可以将 snakemake@input[0] 替换为 </code>,它获取 bash 脚本的第一个参数,将 <code>snakemake@output[0] 替换为 </code>,第二个论点。为了安全起见,用双引号括起来以防文件名中有空格,例如<code>"".