Snakemake Error: No values given for wildcard

Snakemake Error: No values given for wildcard

这是 的后续内容,内容是关于使用 Python 字典生成文件列表以作为单个步骤的输入包含在内。在这种情况下,我有兴趣合并单个样本的 BAM 文件,这些样本是通过映射多次运行的 FASTQ 文件生成的。

我 运行 在我的规则 combine_bams 中出现错误,仅针对单个样本:

InputFunctionException in line 116 of /oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile:
Error:
  WildcardError:
    No values given for wildcard 'samp'.
Wildcards:
  samp=10561-7352-culture_S24
  mapper=bwa
  ref=H37Rv
Traceback:
  File "/oak/stanford/scg/lab_jandr/walter/tb/mtb/workflow/Snakefile", line 118, in <lambda>

似乎samp在通配符列表中定义正确,所以我不确定为什么会调用错误。任何建议都很好,我的 snakemake 文件在下面。谢谢!

# Define samples: 
RUNS, SAMPLES = glob_wildcards(config['fastq_dir'] + "{run}/{samp}_L001_R1_001.fastq.gz")

# Create sample dictionary so that each sample (key) has list of runs (values) associated with it.
sample_dict = {}
for key, val in zip(SAMPLES,RUNS):
  sample_dict.setdefault(key, []).append(val)
#print(sample_dict)

# Constrain mapper and filter wildcards. 
wildcard_constraints: 
  mapper="[a-zA-Z2]+",
  filter="[a-zA-Z2]+",
  run = '|'.join([re.escape(x) for x in RUNS]),
  samp = '|'.join([re.escape(x) for x in SAMPLES]),
  ref = '|'.join([re.escape(x) for x in config['ref']])
    
# Define a rule for running the complete pipeline. 
rule all:
  input:
     trim = expand(['results/{samp}/{run}/trim/{samp}_trim_1.fq.gz'], zip, run = RUNS, samp = SAMPLES),
     kraken=expand('results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz', zip, run = RUNS, samp = SAMPLES),
     bams=expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)), # When using zip, need to use vectors of equal lengths for all wildcards.
     per_samp_run_stats = expand('results/{samp}/{run}/stats/{samp}_{mapper}_{ref}_combined_stats.csv', zip, run = RUNS, samp = SAMPLES, ref = config['ref']*len(RUNS), mapper = config['mapper']*len(RUNS)),          
     combined_bams=expand('results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam', samp = np.unique(SAMPLES),ref=config['ref'], mapper=config['mapper'])

# Trim reads for quality. 
rule trim_reads:  
  input: 
    p1='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R1_001.fastq.gz', 
    p2='/labs/jandr/walter/tb/data/Stanford/{run}/{samp}_L001_R2_001.fastq.gz'
  output: 
    trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
    trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
  log: 
    'results/{samp}/{run}/trim/{samp}_trim_reads.log'
  shell:
    'workflow/scripts/trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'

# # Filter reads taxonomically with Kraken.   
rule taxonomic_filter:
  input:
    trim1='results/{samp}/{run}/trim/{samp}_trim_1.fq.gz',
    trim2='results/{samp}/{run}/trim/{samp}_trim_2.fq.gz'
  output: 
    kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
    kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz',
    kraken_report='results/{samp}/{run}/kraken/{samp}_kraken.report',
    kraken_stats = 'results/{samp}/{run}/kraken/{samp}_kraken_stats.csv'
  log: 
    'results/{samp}/{run}/kraken/{samp}_kraken.log'
  threads: 8
  shell:
    'workflow/scripts/run_kraken.sh {input.trim1} {input.trim2} {output.kr1} {output.kr2} {output.kraken_report} &>> {log}'

# Map reads.
rule map_reads:
  input:
    ref_path='/labs/jandr/walter/tb/data/refs/{ref}.fa',
    kr1='results/{samp}/{run}/kraken/{samp}_trim_kr_1.fq.gz',
    kr2='results/{samp}/{run}/kraken/{samp}_trim_kr_2.fq.gz'
  output:
    bam='results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam'
  params:
    mapper='{mapper}'
  log:
    'results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_map.log'
  threads: 8
  shell:
    "workflow/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"

# Combine reads and remove duplicates (per sample).
rule combine_bams:
  input:
    bams = lambda wildcards: expand('results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam', run=sample_dict[wildcards.samp])
  output: 
    combined_bam = 'results/{samp}/bams/{samp}_{mapper}_{ref}.merged.rmdup.bam'
  log: 
     'results/{samp}/bams/{samp}_{mapper}_{ref}_merge_bams.log'
  threads: 8
  shell:
    "sambamba markdup -r -p -t {threads} {input.bams} {output.combined_bam}"

在规则 combine_bams 中,当使用 lambda 表达式时,您需要提供所有 {} 通配符的值。目前仅提供 run 信息。解决此问题的一种方法是将 kwarg allow_missing=True 包含到 expand:

bams = lambda wildcards: expand(
    "results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
    run=sample_dict[wildcards.samp],
    allow_missing=True,
)

这将等同于:

bams = lambda wildcards: expand(
    "results/{samp}/{run}/bams/{samp}_{mapper}_{ref}_sorted.bam",
    run=sample_dict[wildcards.samp],
    samp="{samp}",
    mapper="{mapper}",
    ref="{ref}",
)