snakemake 替换问题

snakemake substitution issue

我是 snakemake 的新手,我对以下代码有疑问,应该依次获取 9 个 fastq 文件并应用 fastqc。

smp 应采用以下值:

UG1_S12 UG2_S13 UG3_S14 UR1_S1 UR2_S2 UR3_S3 UY1_S6 UY2_S7 UY3_S8

当我 运行

SAMPLES, = glob_wildcards("reads/merged_s{smp}_L001.fastq.gz")
NB_SAMPLES = len(SAMPLES)

for smp in SAMPLES:
  message("Sample " + smp + " will be processed")
message("N= " + str(NB_SAMPLES))

问题是替换下面的 {smp},它首先被 UY2_S7 替换,然后在 mv 命令中被 UY3_S8 替换。

我应该如何确保在同一规则的两个子命令中使用相同的替换?

我当前的代码(inspired by):

SAMPLES, = glob_wildcards("reads/merged_s{smp}_L001.fastq.gz")

rule all: 
  input: 
        expand("reads/merged_s{smp}_L001.fastq.gz", smp=SAMPLES),
        "results/multiqc.html"

rule fastqc:
    """
    Run FastQC on each FASTQ file.
    """
    input:
        "reads/merged_s{smp}_L001.fastq.gz"
    output:
        "results/{smp}_fastqc.html",
        "intermediate/{smp}_fastqc.zip"
    version: "1.0"
    shadow: "minimal"
    threads: 8
    shell:
        """
        # Run fastQC and save the output to the current directory
        fastqc {input} -t {threads} -q -d . -o .

        # Move the files which are used in the workflow
        mv merged_s{smp}_L001_fastqc.html {output[0]}
        mv merged_s{smp}_L001_fastqc.zip {output[1]}
        """

错误:

Error in rule fastqc:
    jobid: 0
    output: results/UY2_S7_fastqc.html, intermediate/UY2_S7_fastqc.zip

RuleException:
CalledProcessError in line 60 of Snakefile:
Command ' set -euo pipefail;  
        # Run fastQC and save the output to the current directory
        fastqc reads/merged_sUY2_S7_L001.fastq.gz -t 8 -q -d . -o .

        # Move the files which are used in the workflow
        mv merged_sUY3_S8_L001_fastqc.html results/UY2_S7_fastqc.html
        mv merged_sUY3_S8_L001_fastqc.zip intermediate/UY2_S7_fastqc.zip ' returned non-zero exit status 130.
  File "Snakefile", line 60, in __rule_fastqc
  File "/opt/biotools/miniconda2/envs/snakemake-tutorial/lib/python3.6/concurrent/futures/thread.py", line 56, in run

如果要在 shell 命令中使用通配符,则必须使用 {wildcards.smp} .
可能发生的情况是 shell 命令中的 {smp} 取上面 for 循环的最后一次迭代的值。所以改变:

shell:
    """
    # Run fastQC and save the output to the current directory
    fastqc {input} -t {threads} -q -d . -o .

    # Move the files which are used in the workflow
    mv merged_s{smp}_L001_fastqc.html {output[0]}
    mv merged_s{smp}_L001_fastqc.zip {output[1]}
    """

进入:

shell:
    """
    # Run fastQC and save the output to the current directory
    fastqc {input} -t {threads} -q -d . -o .

    # Move the files which are used in the workflow
    mv merged_s{wildcards.smp}_L001_fastqc.html {output[0]}
    mv merged_s{wildcards.smp}_L001_fastqc.zip {output[1]}
    """

我还没有检查其余的代码。