Snakemake:MissingInputException 缺少输入文件
Snakemake: MissingInputException Missing input file
我是 Python 的新手,但我必须 运行 一个带有规则文件和配置文件的 snakemake 脚本。但是我被这个错误阻止了:
MissingInputException(job.rule, missing_input)
snakemake.exceptions.MissingInputException: Missing input files for rule all:
Kmer/FS01377_plot.pdf
Kmer/FS070_summary.txt
Kmer/FS01255_summary.txt
Kmer/FS01245_summary.txt [...]
我想我遗漏了一些东西,但我找不到了。当我在没有 R1 的情况下编写 SAMPLES, = glob_wildcards(join(datadir, "{sample}_fastp.fastq.gz"))
时,它适用于“全部规则”但不适用于脚本的其余部分。这是我的规则文件:
#!/usr/bin/env python
from snakemake.shell import shell
from os.path import join
shell.executable("/bin/bash")
#set the workdir
workdir:config["workdir"]
datadir=config["datadir"]
SAMPLES, = glob_wildcards(join(datadir, "{sample}_R1_fastp.fastq.gz"))
# Patterns for the 1st mate and the 2nd mate using the 'sample' wildcard. Snakemake regular expression matching the forward mate FASTQ files.
PATTERN_R1 = '{sample}_R1_fastp.fastq.gz'
PATTERN_R2 = '{sample}_R2_fastp.fastq.gz'
## SNAKEMAKE PROCESS
#All
rule all:
input:
expand("Kmer/{sample}_histo_jellyfish.histo", sample=SAMPLES),
expand("Kmer/{sample}_plot.pdf", sample=SAMPLES),
expand("Kmer/{sample}_plot.log.pdf", sample=SAMPLES),
expand("Kmer/{sample}_model.txt", sample=SAMPLES),
expand("Kmer/{sample}_progress.txt", sample=SAMPLES),
expand("Kmer/{sample}_summary.txt", sample=SAMPLES)
## K-mer process
rule merge_fastq:
input:
r1 = join(datadir, PATTERN_R1),
r2 = join(datadir, PATTERN_R2)
output:
merge_fastq="Kmer/{sample}_merge_fastp.fastq"
message:
"Merge fastq : {wildcards.sample}"
run:
shell(
"cat {input.fastp_R1} {input.fastp_R2} > Kmer/{wildcards.sample}_merge_fastp.fastq.gz "
"gzip -d {wildcards.sample}_merge_fastp.fastq.gz "
)
rule jellyfish_process:
input:
merge_fastq="Kmer/{sample}_merge_fastp.fastq"
output:
reads_jellyfish="Kmer/{sample}_reads_jellyfish"
message:
"Jellyfish processing - Counting k-mer : {wildcards.sample}"
run:
shell(
". /appli/bioinfo/jellyfish/2.2.10/env.sh ; "
"jellyfish count -C -m 21 -s 1000000 -t 30 {input.merge_fastq} -o {output.reads_jellyfish} ; "
". /appli/bioinfo/jellyfish/2.2.10/delenv.sh"
)
rule jellyfish_histogram:
input:
reads_jellyfish="Kmer/{sample}_reads_jellyfish"
output:
histo_jellyfish="Kmer/{sample}_histo_jellyfish.histo"
message:
"Jellyfish processing - Exporting the histogram : {wildcards.sample}"
run:
shell(
". /appli/bioinfo/jellyfish/2.2.10/env.sh ; "
"jellyfish histo -t 10 {input.reads_jellyfish} > {output.histo_jellyfish} ; "
"rm Kmer/{wildcards.sample}_reads_jellyfish "
"rm Kmer/{wildcards.sample}_merge_fastp.fastq "
". /appli/bioinfo/jellyfish/2.2.10/delenv.sh"
)
rule GenomeScope:
input:
histo_jellyfish="Kmer/{sample}_histo_jellyfish.histo"
output:
plot_genomescope="GenomeScope/{sample}_plot.pdf",
model_genomescope="GenomeScope/{sample}_model.txt",
plotlog_genomescope="GenomeScope/{sample}_plot.log.pdf",
progress_genomescope="GenomeScope/{sample}_progress.txt",
summary_genomescope="GenomeScope/{sample}_summary.txt"
message:
"GenomeScope processing - Fitting the model : {wildcards.sample}"
run:
shell(
". /appli/bioinfo/genomescope2/2.0/env.sh ; "
"mv {input.histo_jellyfish} ~ "
"module load R "
"R --vanilla --slave --args "
"{wildcards.sample}_histo_jellyfish.histo "
"21 "
"150 "
"OutputGenomeScope "
"10000 "
"Summary "
"{wildcards.sample} "
"< ~/GenomeScope/genomescope_cluster.R "
"mv OutputGenomeScope/model.txt {output.model_genomescope} && "
"mv OutputGenomeScope/plot.log.pdf {output.plotlog_genomescope} && "
"mv OutputGenomeScope/plot.pdf {output.plot_genomescope} && "
"mv OutputGenomeScope/progress.txt {output.progress_genomescope} && "
"mv OutputGenomeScope/summary.txt {output.summary_genomescope} && "
"mv {wildcards.sample}_histo_jellyfish.histo fastp.fastq/{wildcards.species}/GenomeScope"
". /appli/bioinfo/genomescope2/2.0/delenv.sh"
)
感谢您的帮助。
可能还有其他事情发生,但我猜 rule all
:
expand("Kmer/{sample}_plot.pdf", sample=SAMPLES),
应该是:
expand("GenomeScope/{sample}_plot.pdf", sample=SAMPLES),
为了与rule GenomeScope
的输出一致(rule all
中的其他文件也一样)。
(这是我真正喜欢的 snakemake 的一件事:你很早就发现了这些不一致,在你修复它们之前管道甚至没有启动)
顺便说一下,我不会在 run
指令中使用 shell()
函数,而是:
shell:
r"""
. /appli/bioinfo/genomescope2/2.0/env.sh
mv {input.histo_jellyfish} ~
..etc etc
"""
我是 Python 的新手,但我必须 运行 一个带有规则文件和配置文件的 snakemake 脚本。但是我被这个错误阻止了:
MissingInputException(job.rule, missing_input)
snakemake.exceptions.MissingInputException: Missing input files for rule all:
Kmer/FS01377_plot.pdf
Kmer/FS070_summary.txt
Kmer/FS01255_summary.txt
Kmer/FS01245_summary.txt [...]
我想我遗漏了一些东西,但我找不到了。当我在没有 R1 的情况下编写 SAMPLES, = glob_wildcards(join(datadir, "{sample}_fastp.fastq.gz"))
时,它适用于“全部规则”但不适用于脚本的其余部分。这是我的规则文件:
#!/usr/bin/env python
from snakemake.shell import shell
from os.path import join
shell.executable("/bin/bash")
#set the workdir
workdir:config["workdir"]
datadir=config["datadir"]
SAMPLES, = glob_wildcards(join(datadir, "{sample}_R1_fastp.fastq.gz"))
# Patterns for the 1st mate and the 2nd mate using the 'sample' wildcard. Snakemake regular expression matching the forward mate FASTQ files.
PATTERN_R1 = '{sample}_R1_fastp.fastq.gz'
PATTERN_R2 = '{sample}_R2_fastp.fastq.gz'
## SNAKEMAKE PROCESS
#All
rule all:
input:
expand("Kmer/{sample}_histo_jellyfish.histo", sample=SAMPLES),
expand("Kmer/{sample}_plot.pdf", sample=SAMPLES),
expand("Kmer/{sample}_plot.log.pdf", sample=SAMPLES),
expand("Kmer/{sample}_model.txt", sample=SAMPLES),
expand("Kmer/{sample}_progress.txt", sample=SAMPLES),
expand("Kmer/{sample}_summary.txt", sample=SAMPLES)
## K-mer process
rule merge_fastq:
input:
r1 = join(datadir, PATTERN_R1),
r2 = join(datadir, PATTERN_R2)
output:
merge_fastq="Kmer/{sample}_merge_fastp.fastq"
message:
"Merge fastq : {wildcards.sample}"
run:
shell(
"cat {input.fastp_R1} {input.fastp_R2} > Kmer/{wildcards.sample}_merge_fastp.fastq.gz "
"gzip -d {wildcards.sample}_merge_fastp.fastq.gz "
)
rule jellyfish_process:
input:
merge_fastq="Kmer/{sample}_merge_fastp.fastq"
output:
reads_jellyfish="Kmer/{sample}_reads_jellyfish"
message:
"Jellyfish processing - Counting k-mer : {wildcards.sample}"
run:
shell(
". /appli/bioinfo/jellyfish/2.2.10/env.sh ; "
"jellyfish count -C -m 21 -s 1000000 -t 30 {input.merge_fastq} -o {output.reads_jellyfish} ; "
". /appli/bioinfo/jellyfish/2.2.10/delenv.sh"
)
rule jellyfish_histogram:
input:
reads_jellyfish="Kmer/{sample}_reads_jellyfish"
output:
histo_jellyfish="Kmer/{sample}_histo_jellyfish.histo"
message:
"Jellyfish processing - Exporting the histogram : {wildcards.sample}"
run:
shell(
". /appli/bioinfo/jellyfish/2.2.10/env.sh ; "
"jellyfish histo -t 10 {input.reads_jellyfish} > {output.histo_jellyfish} ; "
"rm Kmer/{wildcards.sample}_reads_jellyfish "
"rm Kmer/{wildcards.sample}_merge_fastp.fastq "
". /appli/bioinfo/jellyfish/2.2.10/delenv.sh"
)
rule GenomeScope:
input:
histo_jellyfish="Kmer/{sample}_histo_jellyfish.histo"
output:
plot_genomescope="GenomeScope/{sample}_plot.pdf",
model_genomescope="GenomeScope/{sample}_model.txt",
plotlog_genomescope="GenomeScope/{sample}_plot.log.pdf",
progress_genomescope="GenomeScope/{sample}_progress.txt",
summary_genomescope="GenomeScope/{sample}_summary.txt"
message:
"GenomeScope processing - Fitting the model : {wildcards.sample}"
run:
shell(
". /appli/bioinfo/genomescope2/2.0/env.sh ; "
"mv {input.histo_jellyfish} ~ "
"module load R "
"R --vanilla --slave --args "
"{wildcards.sample}_histo_jellyfish.histo "
"21 "
"150 "
"OutputGenomeScope "
"10000 "
"Summary "
"{wildcards.sample} "
"< ~/GenomeScope/genomescope_cluster.R "
"mv OutputGenomeScope/model.txt {output.model_genomescope} && "
"mv OutputGenomeScope/plot.log.pdf {output.plotlog_genomescope} && "
"mv OutputGenomeScope/plot.pdf {output.plot_genomescope} && "
"mv OutputGenomeScope/progress.txt {output.progress_genomescope} && "
"mv OutputGenomeScope/summary.txt {output.summary_genomescope} && "
"mv {wildcards.sample}_histo_jellyfish.histo fastp.fastq/{wildcards.species}/GenomeScope"
". /appli/bioinfo/genomescope2/2.0/delenv.sh"
)
感谢您的帮助。
可能还有其他事情发生,但我猜 rule all
:
expand("Kmer/{sample}_plot.pdf", sample=SAMPLES),
应该是:
expand("GenomeScope/{sample}_plot.pdf", sample=SAMPLES),
为了与rule GenomeScope
的输出一致(rule all
中的其他文件也一样)。
(这是我真正喜欢的 snakemake 的一件事:你很早就发现了这些不一致,在你修复它们之前管道甚至没有启动)
顺便说一下,我不会在 run
指令中使用 shell()
函数,而是:
shell:
r"""
. /appli/bioinfo/genomescope2/2.0/env.sh
mv {input.histo_jellyfish} ~
..etc etc
"""