Snakemake 使用 config.yaml 运行规则太多次
Snakemake runs rule too many times using config.yaml
我正在尝试创建此 snakemake 工作流程,该工作流程将使用 FastQc 评估原始读取质量并使用 MultiQC 创建 raport。我使用了 4 个输入文件并获得了预期的结果,但是我只是注意到每个规则得到 运行 4 次并且每次都接受所有 4 个输入,我不确定如何解决这个问题。谁能帮我弄清楚如何:
- 运行 规则 4 次,但一次只使用来自 config.yaml 的一个输入?
- 运行 规则 1 次但使用所有 4 个输入?
我正在尝试遵循 snakemake tutorial 但到目前为止运气不佳。
Snakefile:
configfile: "config.yaml"
rule all:
input:
expand("outputs/multiqc_report_1/{sample}_multiqc_report_1.html", sample=config["samples"])
rule raw_fastqc:
input:
expand("data/samples/{sample}.fastq", sample=config["samples"])
output:
"outputs/fastqc_1/{sample}_fastqc.html",
"outputs/fastqc_1/{sample}_fastqc.zip"
shell:
"fastqc {input} -o outputs/fastqc_1/"
rule raw_multiqc:
input:
expand("outputs/fastqc_1/{sample}_fastqc.html", sample=config["samples"]),
expand("outputs/fastqc_1/{sample}_fastqc.zip", sample=config["samples"])
output:
"outputs/multiqc_report_1/{sample}_multiqc_report_1.html"
shell:
"multiqc ./outputs/fastqc_1/ -n {output}"
config.yaml 文件:
samples:
Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R1_001: data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R1_001.fastq
Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001: data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001.fastq
KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R1_001: data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R1_001.fastq
KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R2_001: data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R2_001.fastq
我运行 snakemake 使用命令:
snakemake -s Snakefile --core 1
每条规则运行4次:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
----------- ------- ------------- -------------
all 1 1 1
raw_fastqc 4 1 1
raw_multiqc 4 1 1
total 9 1 1
但是每次都使用所有 4 个输入:
[Sun May 15 23:06:22 2022]
rule raw_fastqc:
input: data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R1_001.fastq, data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001.fastq, data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R1_001.fastq, data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R2_001.fastq
output: outputs/fastqc_1/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001_fastqc.html, outputs/fastqc_1/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001_fastqc.zip
jobid: 3
wildcards: sample=Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001
resources: tmpdir=/tmp
您的问题是在每个规则的输入中使用 expand()
。因为 expand
填写通配符值,所以您只需在 all
规则中执行此操作,因为通配符值会传递给上游规则。
蛇文件:
configfile: "config.yaml"
rule all:
input:
expand("outputs/multiqc_report_1/{sample}_multiqc_report_1.html", sample=config["samples"])
rule raw_fastqc:
input:
"data/samples/{sample}.fastq"
output:
"outputs/fastqc_1/{sample}_fastqc.html",
"outputs/fastqc_1/{sample}_fastqc.zip"
shell:
"fastqc {input} -o outputs/fastqc_1/"
rule raw_multiqc:
input:
"outputs/fastqc_1/{sample}_fastqc.html",
"outputs/fastqc_1/{sample}_fastqc.zip",
output:
"outputs/multiqc_report_1/{sample}_multiqc_report_1.html"
shell:
"multiqc ./outputs/fastqc_1/ -n {output}"
我正在尝试创建此 snakemake 工作流程,该工作流程将使用 FastQc 评估原始读取质量并使用 MultiQC 创建 raport。我使用了 4 个输入文件并获得了预期的结果,但是我只是注意到每个规则得到 运行 4 次并且每次都接受所有 4 个输入,我不确定如何解决这个问题。谁能帮我弄清楚如何:
- 运行 规则 4 次,但一次只使用来自 config.yaml 的一个输入?
- 运行 规则 1 次但使用所有 4 个输入?
我正在尝试遵循 snakemake tutorial 但到目前为止运气不佳。
Snakefile:
configfile: "config.yaml"
rule all:
input:
expand("outputs/multiqc_report_1/{sample}_multiqc_report_1.html", sample=config["samples"])
rule raw_fastqc:
input:
expand("data/samples/{sample}.fastq", sample=config["samples"])
output:
"outputs/fastqc_1/{sample}_fastqc.html",
"outputs/fastqc_1/{sample}_fastqc.zip"
shell:
"fastqc {input} -o outputs/fastqc_1/"
rule raw_multiqc:
input:
expand("outputs/fastqc_1/{sample}_fastqc.html", sample=config["samples"]),
expand("outputs/fastqc_1/{sample}_fastqc.zip", sample=config["samples"])
output:
"outputs/multiqc_report_1/{sample}_multiqc_report_1.html"
shell:
"multiqc ./outputs/fastqc_1/ -n {output}"
config.yaml 文件:
samples:
Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R1_001: data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R1_001.fastq
Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001: data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001.fastq
KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R1_001: data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R1_001.fastq
KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R2_001: data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R2_001.fastq
我运行 snakemake 使用命令:
snakemake -s Snakefile --core 1
每条规则运行4次:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
----------- ------- ------------- -------------
all 1 1 1
raw_fastqc 4 1 1
raw_multiqc 4 1 1
total 9 1 1
但是每次都使用所有 4 个输入:
[Sun May 15 23:06:22 2022]
rule raw_fastqc:
input: data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R1_001.fastq, data/samples/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001.fastq, data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R1_001.fastq, data/samples/KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R2_001.fastq
output: outputs/fastqc_1/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001_fastqc.html, outputs/fastqc_1/Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001_fastqc.zip
jobid: 3
wildcards: sample=Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R2_001
resources: tmpdir=/tmp
您的问题是在每个规则的输入中使用 expand()
。因为 expand
填写通配符值,所以您只需在 all
规则中执行此操作,因为通配符值会传递给上游规则。
蛇文件:
configfile: "config.yaml"
rule all:
input:
expand("outputs/multiqc_report_1/{sample}_multiqc_report_1.html", sample=config["samples"])
rule raw_fastqc:
input:
"data/samples/{sample}.fastq"
output:
"outputs/fastqc_1/{sample}_fastqc.html",
"outputs/fastqc_1/{sample}_fastqc.zip"
shell:
"fastqc {input} -o outputs/fastqc_1/"
rule raw_multiqc:
input:
"outputs/fastqc_1/{sample}_fastqc.html",
"outputs/fastqc_1/{sample}_fastqc.zip",
output:
"outputs/multiqc_report_1/{sample}_multiqc_report_1.html"
shell:
"multiqc ./outputs/fastqc_1/ -n {output}"