snakemake - 为没有通配符的聚合规则定义输入
snakemake - define input for aggregate rule without wildcards
我正在编写一个 snakemake 来从纳米孔测序中生成 Sars-Cov-2 变体。我正在编写的管道基于 artic network,因此我使用 artic guppyplex
和 artic minion
.
我写的snakemake有以下步骤:
- 压缩所有条形码的所有
fastq
文件(规则 zipFq
)
- 使用
guppyplex
执行读取过滤(规则 guppyplex
)
- 调用
artic minion
管道(规则 minion
)
- 将 stderr 和 stdout 从 qsub 移动到工作目录下的文件夹(规则
mvQsubLogs
)
下面是我目前写的 snakemake,可以正常工作
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# zip fq
"zipFastq/{barcode}/{barcode}.zip",
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# nanopolish
"nanopolish/{barcode}",
# directory where the logs will be moved to
"logs/{barcode}"
], barcode = barcodes)
rule zipFq:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}"
output:
"zipFastq/{barcode}/{barcode}.zip"
shell:
"zip {output} {input.FQ}/*"
rule guppyplex:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"{FAST5_PATH}/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
directory("nanopolish/{barcode}")
shell:
"""
mkdir {params.OUTDIR};
cd {params.OUTDIR};
export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5} nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
"""
rule mvQsubLogs:
input:
# zipFQ
rules.zipFq.output,
# guppyplex
rules.guppyplex.output,
# nanopolish
rules.minion.output
output:
directory("logs/{barcode}")
shell:
"mkdir -p {output} \n"
"mv {LOGDIR}/{wildcards.barcode}* {output}/"
上面的 snakemake 有效,现在我正在尝试添加另一个规则,但这里的区别是这个规则是一个聚合函数,即它不应该为每个 barcode
调用,但毕竟只调用一次所有条形码都调用了规则
我试图合并的规则 (catFasta) 会将 cat
所有 {barcode}.consensus.fasta
(由规则 minion
生成)放入单个文件中,如下所示(合并到上面的 snakemake):
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# zip fq
"zipFastq/{barcode}/{barcode}.zip",
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# nanopolish
"nanopolish/{barcode}",
# catFasta
"catFasta/cat_consensus.fasta",
# directory where the logs will be moved to
"logs/{barcode}"
], barcode = barcodes)
rule zipFq:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}"
output:
"zipFastq/{barcode}/{barcode}.zip"
shell:
"zip {output} {input.FQ}/*"
rule guppyplex:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"{FAST5_PATH}/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
directory("nanopolish/{barcode}")
shell:
"""
mkdir {params.OUTDIR};
cd {params.OUTDIR};
export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5} nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
"""
rule catFasta:
input:
expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
output:
"catFasta/cat_consensus.fasta"
shell:
"cat {input} > {output}"
rule mvQsubLogs:
input:
# zipFQ
rules.zipFq.output,
# guppyplex
rules.guppyplex.output,
# nanopolish
rules.minion.output,
# catFasta
rules.catFasta.output
output:
directory("logs/{barcode}")
shell:
"mkdir -p {output} \n"
"mv {LOGDIR}/{wildcards.barcode}* {output}/"
然而,当我用
调用 snakemake 时
(artic-ncov2019) ngs@bngs05b:/nexusb/SC2/ONT/scripts/SnakeMake> snakemake -np -s Snakefile_v2 --cluster "qsub -q onlybngs05b -e {LOGDIR} -o {LOGDIR} -j y" -j 5 --jobname "{wildcards.barcode}.{rule}.{jobid}" all # LOGDIR parsed from config.yaml
我得到:
Building DAG of jobs...
MissingInputException in line 178 of /nexusb/SC2/ONT/scripts/SnakeMake/Snakefile_v2:
Missing input files for rule guppyplex:
/nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta
我觉得这不容易理解:snakemake 抱怨 /nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta
而 /nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/
是 FASTQ_PATH
而我没有在任何地方定义 f"{FASTQ_PATH}/{{barcode}}.consensus.fasta"
描述了一个完全相同的问题 here,尽管接受的答案中的策略(规则 catFasta
的输入将是 expand("nanopolish/{{barcode}}/{{barcode}}.consensus.fasta")
)对我不起作用。
有谁知道我该如何规避这个问题?
失败的规则是 rule guppyplex
,它查找 {FASTQ_PATH}/{{barcode}}
.
形式的输入
看起来通配符 {barcode}
填充了 barcode49/barcode49.consensus.fasta
,我认为这是由于两个原因造成的:
首先(也是最重要的):工作流没有找到更好的方法来产生最终输出。在 rule catFasta
中,您提供了一个输入文件,该文件在您的工作流程中从未被描述为输出。 rule minion
将目录作为输出,但没有文件,并且对于生成此输入文件的工作流程并不完全清楚。
因此推断 {barcode}
通配符必须以某种方式包含它以前从未见过的 .consensus.fasta
。这个通配符然后被移交给顶部,工作流在那里崩溃,因为它找不到匹配的输入文件。
其二:这个通配符的初始化with sth。你不想要的是唯一可能的,因为你没有正确约束通配符。例如,您可以禁止通配符包含 .
(参见 wildcard_constraints
here)
但是,主要的问题是catFasta
没有找到想要的输入。我建议将 minion
的输出更改为 "nanopolish/{barcode}/{barcode}.consensus.fasta"
,因为您已经从参数中获取了 OUTDIR,这不会影响您的规则。
编辑:虚拟测试示例:
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# catFasta
"catFasta/cat_consensus.fasta",
], barcode = barcodes)
rule guppyplex:
input:
FQ = f"fastq/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"touch {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"fasta/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
"nanopolish/{barcode}/{barcode}.consensus.fasta"
shell:
"""
touch {output} && echo {wildcards.barcode} > {output}
"""
rule catFasta:
input:
expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
output:
"catFasta/cat_consensus.fasta"
shell:
"cat {input} > {output}"
我正在编写一个 snakemake 来从纳米孔测序中生成 Sars-Cov-2 变体。我正在编写的管道基于 artic network,因此我使用 artic guppyplex
和 artic minion
.
我写的snakemake有以下步骤:
- 压缩所有条形码的所有
fastq
文件(规则zipFq
) - 使用
guppyplex
执行读取过滤(规则guppyplex
) - 调用
artic minion
管道(规则minion
) - 将 stderr 和 stdout 从 qsub 移动到工作目录下的文件夹(规则
mvQsubLogs
)
下面是我目前写的 snakemake,可以正常工作
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# zip fq
"zipFastq/{barcode}/{barcode}.zip",
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# nanopolish
"nanopolish/{barcode}",
# directory where the logs will be moved to
"logs/{barcode}"
], barcode = barcodes)
rule zipFq:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}"
output:
"zipFastq/{barcode}/{barcode}.zip"
shell:
"zip {output} {input.FQ}/*"
rule guppyplex:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"{FAST5_PATH}/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
directory("nanopolish/{barcode}")
shell:
"""
mkdir {params.OUTDIR};
cd {params.OUTDIR};
export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5} nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
"""
rule mvQsubLogs:
input:
# zipFQ
rules.zipFq.output,
# guppyplex
rules.guppyplex.output,
# nanopolish
rules.minion.output
output:
directory("logs/{barcode}")
shell:
"mkdir -p {output} \n"
"mv {LOGDIR}/{wildcards.barcode}* {output}/"
上面的 snakemake 有效,现在我正在尝试添加另一个规则,但这里的区别是这个规则是一个聚合函数,即它不应该为每个 barcode
调用,但毕竟只调用一次所有条形码都调用了规则
我试图合并的规则 (catFasta) 会将 cat
所有 {barcode}.consensus.fasta
(由规则 minion
生成)放入单个文件中,如下所示(合并到上面的 snakemake):
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# zip fq
"zipFastq/{barcode}/{barcode}.zip",
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# nanopolish
"nanopolish/{barcode}",
# catFasta
"catFasta/cat_consensus.fasta",
# directory where the logs will be moved to
"logs/{barcode}"
], barcode = barcodes)
rule zipFq:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}"
output:
"zipFastq/{barcode}/{barcode}.zip"
shell:
"zip {output} {input.FQ}/*"
rule guppyplex:
input:
FQ = f"{FASTQ_PATH}/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"/home/ngs/miniconda3/envs/artic-ncov2019/bin/artic guppyplex --skip-quality-check --min-length {MINLENGTHGUPPY} --max-length {MAXLENGTHGUPPY} --directory {input.FQ} --prefix {wildcards.barcode} --output {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"{FAST5_PATH}/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
directory("nanopolish/{barcode}")
shell:
"""
mkdir {params.OUTDIR};
cd {params.OUTDIR};
export PATH=/home/ngs/miniconda3/envs/artic-ncov2019/bin:$PATH;
artic minion --normalise {NANOPOLISH_NORMALISE} --threads {THREADS} --scheme-directory {PRIMERSDIR} --read-file ../../{input.INFQ} --sequencing-summary {Seq_Sum} --fast5-directory {input.FAST5} nCoV-2019/{PRIMERVERSION} {wildcards.barcode} # variables in CAPITALS are parsed from config.yaml
"""
rule catFasta:
input:
expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
output:
"catFasta/cat_consensus.fasta"
shell:
"cat {input} > {output}"
rule mvQsubLogs:
input:
# zipFQ
rules.zipFq.output,
# guppyplex
rules.guppyplex.output,
# nanopolish
rules.minion.output,
# catFasta
rules.catFasta.output
output:
directory("logs/{barcode}")
shell:
"mkdir -p {output} \n"
"mv {LOGDIR}/{wildcards.barcode}* {output}/"
然而,当我用
调用 snakemake 时(artic-ncov2019) ngs@bngs05b:/nexusb/SC2/ONT/scripts/SnakeMake> snakemake -np -s Snakefile_v2 --cluster "qsub -q onlybngs05b -e {LOGDIR} -o {LOGDIR} -j y" -j 5 --jobname "{wildcards.barcode}.{rule}.{jobid}" all # LOGDIR parsed from config.yaml
我得到:
Building DAG of jobs...
MissingInputException in line 178 of /nexusb/SC2/ONT/scripts/SnakeMake/Snakefile_v2:
Missing input files for rule guppyplex:
/nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta
我觉得这不容易理解:snakemake 抱怨 /nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/barcode49/barcode49.consensus.fasta
而 /nexus/Gridion/20210521_Covid7/Covid7/20210521_0926_X1_FAL11796_a5b62ac2/fastq_pass/
是 FASTQ_PATH
而我没有在任何地方定义 f"{FASTQ_PATH}/{{barcode}}.consensus.fasta"
描述了一个完全相同的问题 here,尽管接受的答案中的策略(规则 catFasta
的输入将是 expand("nanopolish/{{barcode}}/{{barcode}}.consensus.fasta")
)对我不起作用。
有谁知道我该如何规避这个问题?
失败的规则是 rule guppyplex
,它查找 {FASTQ_PATH}/{{barcode}}
.
看起来通配符 {barcode}
填充了 barcode49/barcode49.consensus.fasta
,我认为这是由于两个原因造成的:
首先(也是最重要的):工作流没有找到更好的方法来产生最终输出。在 rule catFasta
中,您提供了一个输入文件,该文件在您的工作流程中从未被描述为输出。 rule minion
将目录作为输出,但没有文件,并且对于生成此输入文件的工作流程并不完全清楚。
因此推断 {barcode}
通配符必须以某种方式包含它以前从未见过的 .consensus.fasta
。这个通配符然后被移交给顶部,工作流在那里崩溃,因为它找不到匹配的输入文件。
其二:这个通配符的初始化with sth。你不想要的是唯一可能的,因为你没有正确约束通配符。例如,您可以禁止通配符包含 .
(参见 wildcard_constraints
here)
但是,主要的问题是catFasta
没有找到想要的输入。我建议将 minion
的输出更改为 "nanopolish/{barcode}/{barcode}.consensus.fasta"
,因为您已经从参数中获取了 OUTDIR,这不会影响您的规则。
编辑:虚拟测试示例:
barcodes = ['barcode49', 'barcode50', 'barcode51']
rule all:
input:
expand([
# guppyplex
"guppyplex/{barcode}/{barcode}.fastq",
# catFasta
"catFasta/cat_consensus.fasta",
], barcode = barcodes)
rule guppyplex:
input:
FQ = f"fastq/{{barcode}}" # FASTQ_PATH is parsed from config.yaml
output:
"guppyplex/{barcode}/{barcode}.fastq"
shell:
"touch {output}" # variables in CAPITALS are parsed from config.yaml
rule minion:
input:
INFQ = rules.guppyplex.output,
FAST5 = f"fasta/{{barcode}}"
params:
OUTDIR = "nanopolish/{barcode}"
output:
"nanopolish/{barcode}/{barcode}.consensus.fasta"
shell:
"""
touch {output} && echo {wildcards.barcode} > {output}
"""
rule catFasta:
input:
expand("nanopolish/{barcode}/{barcode}.consensus.fasta", barcode = barcodes)
output:
"catFasta/cat_consensus.fasta"
shell:
"cat {input} > {output}"