Snakemake Error: No values given for wildcard
Snakemake Error: No values given for wildcard
这是 的后续内容,内容是关于使用 Python
字典生成文件列表以作为单个步骤的输入包含在内。在这种情况下,我有兴趣合并单个样本的 BAM 文件,这些样本是通过映射多次运行的 FASTQ 文件生成的。
我 运行 在我的规则 combine_bams
中出现错误,仅针对单个样本:
InputFunctionException in line 116 of /oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile:
Error:
WildcardError:
No values given for wildcard 'samp'.
Wildcards:
samp=10561-7352-culture_S24
mapper=bwa
ref=H37Rv
Traceback:
File "/oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile", line 118, in <lambda>
似乎samp
在通配符列表中定义正确,所以我不确定为什么会调用错误。任何建议都很好,我的 snakemake
文件在下面。谢谢!
# Define samples:
RUNS, SAMPLES = glob_wildcards(config['fastq_dir'] + "{run}/{samp}_L001_R1_001.fastq.gz")
# Create sample dictionary so that each sample (key) has list of runs (values) associated with it.
sample_dict = {}
for key, val in zip(SAMPLES,RUNS):
sample_dict.setdefault(key, []).append(val)
#print(sample_dict)
# Constrain mapper and filter wildcards.
wildcard_constraints:
mapper="[a-zA-Z2]+",
filter="[a-zA-Z2]+",
run = '|'.join([re.escape(x) for x in RUNS]),
samp = '|'.join([re.escape(x) for x in SAMPLES]),
ref = '|'.join([re.escape(x) for x in config['ref']])
# Define a rule for running the complete pipeline.
rule all:
input:
trim = expand(['results/{samp}/{run}/trim/{samp}_trim_1.fq.gz'], zip, run = RUNS, samp = SAMPLES),
kraken=expand('results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz', zip, run = RUNS, samp = SAMPLES),
bams=expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)), # When using zip, need to use vectors of equal lengths for all wildcards.
per_samp_run_stats = expand('results/{samp}/{run}/stats/{samp}_{mapper}_{ref}_combined_stats.csv', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)),
combined_bams=expand('results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam', samp = np.unique(SAMPLES),ref=config['ref'], mapper=config['mapper'])
# Trim reads for quality.
rule trim_reads:
input:
p1='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R1_001.fastq.gz',
p2='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R2_001.fastq.gz'
output:
trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
log:
'results/{samp}/{run}/trim/{samp}_trim_reads.log'
shell:
'workflow/scripts/trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
# # Filter reads taxonomically with Kraken.
rule taxonomic_filter:
input:
trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
output:
kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz',
kraken_report='results/{samp}/{run}/kraken/{samp}_kraken.report',
kraken_stats = 'results/{samp}/{run}/kraken/{samp}_kraken_stats.csv'
log:
'results/{samp}/{run}/kraken/{samp}_kraken.log'
threads: 8
shell:
'workflow/scripts/run_kraken.sh {input.trim1} {input.trim2} {output.kr1} {output.kr2} {output.kraken_report} &>> {log}'
# Map reads.
rule map_reads:
input:
ref_path='/labs/jandr/walter/tb/data/refs/{ref}.fa',
kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz'
output:
bam='results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam'
params:
mapper='{mapper}'
log:
'results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_map.log'
threads: 8
shell:
"workflow/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"
# Combine reads and remove duplicates (per sample).
rule combine_bams:
input:
bams = lambda wildcards: expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', run=sample_dict[wildcards.samp])
output:
combined_bam = 'results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam'
log:
'results/{samp}/bams/{samp}_{mapper}_{ref}_merge_bams.log'
threads: 8
shell:
"sambamba markdup -r -p -t {threads} {input.bams} {output.combined_bam}"
在规则 combine_bams
中,当使用 lambda
表达式时,您需要提供所有 {}
通配符的值。目前仅提供 run
信息。解决此问题的一种方法是将 kwarg allow_missing=True
包含到 expand
:
bams = lambda wildcards: expand(
"results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
run=sample_dict[wildcards.samp],
allow_missing=True,
)
这将等同于:
bams = lambda wildcards: expand(
"results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
run=sample_dict[wildcards.samp],
samp="{samp}",
mapper="{mapper}",
ref="{ref}",
)
这是 Python
字典生成文件列表以作为单个步骤的输入包含在内。在这种情况下,我有兴趣合并单个样本的 BAM 文件,这些样本是通过映射多次运行的 FASTQ 文件生成的。
我 运行 在我的规则 combine_bams
中出现错误,仅针对单个样本:
InputFunctionException in line 116 of /oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile:
Error:
WildcardError:
No values given for wildcard 'samp'.
Wildcards:
samp=10561-7352-culture_S24
mapper=bwa
ref=H37Rv
Traceback:
File "/oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile", line 118, in <lambda>
似乎samp
在通配符列表中定义正确,所以我不确定为什么会调用错误。任何建议都很好,我的 snakemake
文件在下面。谢谢!
# Define samples:
RUNS, SAMPLES = glob_wildcards(config['fastq_dir'] + "{run}/{samp}_L001_R1_001.fastq.gz")
# Create sample dictionary so that each sample (key) has list of runs (values) associated with it.
sample_dict = {}
for key, val in zip(SAMPLES,RUNS):
sample_dict.setdefault(key, []).append(val)
#print(sample_dict)
# Constrain mapper and filter wildcards.
wildcard_constraints:
mapper="[a-zA-Z2]+",
filter="[a-zA-Z2]+",
run = '|'.join([re.escape(x) for x in RUNS]),
samp = '|'.join([re.escape(x) for x in SAMPLES]),
ref = '|'.join([re.escape(x) for x in config['ref']])
# Define a rule for running the complete pipeline.
rule all:
input:
trim = expand(['results/{samp}/{run}/trim/{samp}_trim_1.fq.gz'], zip, run = RUNS, samp = SAMPLES),
kraken=expand('results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz', zip, run = RUNS, samp = SAMPLES),
bams=expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)), # When using zip, need to use vectors of equal lengths for all wildcards.
per_samp_run_stats = expand('results/{samp}/{run}/stats/{samp}_{mapper}_{ref}_combined_stats.csv', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)),
combined_bams=expand('results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam', samp = np.unique(SAMPLES),ref=config['ref'], mapper=config['mapper'])
# Trim reads for quality.
rule trim_reads:
input:
p1='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R1_001.fastq.gz',
p2='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R2_001.fastq.gz'
output:
trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
log:
'results/{samp}/{run}/trim/{samp}_trim_reads.log'
shell:
'workflow/scripts/trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
# # Filter reads taxonomically with Kraken.
rule taxonomic_filter:
input:
trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
output:
kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz',
kraken_report='results/{samp}/{run}/kraken/{samp}_kraken.report',
kraken_stats = 'results/{samp}/{run}/kraken/{samp}_kraken_stats.csv'
log:
'results/{samp}/{run}/kraken/{samp}_kraken.log'
threads: 8
shell:
'workflow/scripts/run_kraken.sh {input.trim1} {input.trim2} {output.kr1} {output.kr2} {output.kraken_report} &>> {log}'
# Map reads.
rule map_reads:
input:
ref_path='/labs/jandr/walter/tb/data/refs/{ref}.fa',
kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz'
output:
bam='results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam'
params:
mapper='{mapper}'
log:
'results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_map.log'
threads: 8
shell:
"workflow/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"
# Combine reads and remove duplicates (per sample).
rule combine_bams:
input:
bams = lambda wildcards: expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', run=sample_dict[wildcards.samp])
output:
combined_bam = 'results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam'
log:
'results/{samp}/bams/{samp}_{mapper}_{ref}_merge_bams.log'
threads: 8
shell:
"sambamba markdup -r -p -t {threads} {input.bams} {output.combined_bam}"
在规则 combine_bams
中,当使用 lambda
表达式时,您需要提供所有 {}
通配符的值。目前仅提供 run
信息。解决此问题的一种方法是将 kwarg allow_missing=True
包含到 expand
:
bams = lambda wildcards: expand(
"results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
run=sample_dict[wildcards.samp],
allow_missing=True,
)
这将等同于:
bams = lambda wildcards: expand(
"results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
run=sample_dict[wildcards.samp],
samp="{samp}",
mapper="{mapper}",
ref="{ref}",
)