如何通过 lambda 函数传播 snakemake 通配符
how to propapagate snakemake wildcards through a lambda function
我正在尝试使用 snakemake 将 vcf 文件合并在一起,但出现错误:
Building DAG of jobs...
MissingInputException in line 21 of
Missing input files for rule all:
outputs/****.g.vcf.gz
目标是调用不同染色体上的变体,然后将它们合并回一起。示例通配符似乎没有通过我的 lambda 函数传播。我尝试了几次不同的迭代,但未能破解它。我确定其余代码没问题,因为当我删除合并功能并仅调用所有染色体的变体时,文件工作正常。
如有任何帮助,我们将不胜感激。
import glob
configfile: "config.json"
chroms = [1, 2, 3, 4, 5]
str_chroms = ["chr{}".format(chr) for chr in chroms]
def get_fq1(wildcards):
# code that returns a list of fastq files for read 1 based on
# *wildcards.sample* e.g.
return sorted(glob.glob(wildcards.sample + '*_R1_001.fastq.gz'))
def get_fq2(wildcards):
# code that returns a list of fastq files for read 2 based
# on *wildcards.sample*, e.g.
return sorted(glob.glob(wildcards.sample + '*_R2_001.fastq.gz'))
rule all:
input:
"outputs/" + config['sample'] + "_picard_alignment_metrics_output.txt",
"outputs/" + config['sample'] + "_fastqc",
"outputs/" + config['sample'] + "_analyze_covariates.pdf",
"outputs/" + config['sample'] + ".g.vcf.gz",
"outputs/" + config['sample'] + ".coverage"
rule bwa_map:
input:
config['reference_file'],
get_fq1,
get_fq2
output:
"outputs/{sample}_sorted.bam"
shell:
"bwa mem -t 16 {input} | samtools view -bS - | \
samtools sort -@ 16 -m 7G - -o {output}"
#a bunch of intermediate steps that are not the issue
rule variant_calling:
input:
bam = "outputs/{sample}_recal_reads.bam",
bai = "outputs/{sample}_recal_reads.bam.bai",
reference_file = config['reference_file']
output:
"outputs/{sample}_{chr}.g.vcf.gz"
shell:
"""gatk --java-options "-Xmx128g" HaplotypeCaller \
-R {reference_file} -I {input.bam} -L {wildcards.chr}\
-O {output} -ERC GVCF"""
rule merge_vcfs:
input:
lambda wildcards: expand("outputs/{sample}_{chr}.g.vcf.gz",
chr=str_chroms,
sample=wildcards.sample)
output:
"output/{sample}.g.vcf.gz"
shell:
"vcf-merge {input} | bgzip -c > {output}"
rule merge_vcfs
的 output
中有错别字。它应该是 outputs/{sample}.g.vcf.gz
(即 outputs
而不是 output
)
是的,正如@JeeYem 提到的,您在规则合并的输出文件中有错字。
我也没有看到规则合并中需要 lambda 吗?无论样本如何,您都在传递同一组染色体? str_chroms
独立于您设置中的示例,因此您可以将其重写为:
rule merge_vcfs:
input: expand("outputs/{{sample}}_{chr}.g.vcf.gz",chr=str_chroms)
output: "output/{sample}.g.vcf.gz"
shell: "vcf-merge {input} | bgzip -c > {output}"
我正在尝试使用 snakemake 将 vcf 文件合并在一起,但出现错误:
Building DAG of jobs...
MissingInputException in line 21 of
Missing input files for rule all:
outputs/****.g.vcf.gz
目标是调用不同染色体上的变体,然后将它们合并回一起。示例通配符似乎没有通过我的 lambda 函数传播。我尝试了几次不同的迭代,但未能破解它。我确定其余代码没问题,因为当我删除合并功能并仅调用所有染色体的变体时,文件工作正常。
如有任何帮助,我们将不胜感激。
import glob
configfile: "config.json"
chroms = [1, 2, 3, 4, 5]
str_chroms = ["chr{}".format(chr) for chr in chroms]
def get_fq1(wildcards):
# code that returns a list of fastq files for read 1 based on
# *wildcards.sample* e.g.
return sorted(glob.glob(wildcards.sample + '*_R1_001.fastq.gz'))
def get_fq2(wildcards):
# code that returns a list of fastq files for read 2 based
# on *wildcards.sample*, e.g.
return sorted(glob.glob(wildcards.sample + '*_R2_001.fastq.gz'))
rule all:
input:
"outputs/" + config['sample'] + "_picard_alignment_metrics_output.txt",
"outputs/" + config['sample'] + "_fastqc",
"outputs/" + config['sample'] + "_analyze_covariates.pdf",
"outputs/" + config['sample'] + ".g.vcf.gz",
"outputs/" + config['sample'] + ".coverage"
rule bwa_map:
input:
config['reference_file'],
get_fq1,
get_fq2
output:
"outputs/{sample}_sorted.bam"
shell:
"bwa mem -t 16 {input} | samtools view -bS - | \
samtools sort -@ 16 -m 7G - -o {output}"
#a bunch of intermediate steps that are not the issue
rule variant_calling:
input:
bam = "outputs/{sample}_recal_reads.bam",
bai = "outputs/{sample}_recal_reads.bam.bai",
reference_file = config['reference_file']
output:
"outputs/{sample}_{chr}.g.vcf.gz"
shell:
"""gatk --java-options "-Xmx128g" HaplotypeCaller \
-R {reference_file} -I {input.bam} -L {wildcards.chr}\
-O {output} -ERC GVCF"""
rule merge_vcfs:
input:
lambda wildcards: expand("outputs/{sample}_{chr}.g.vcf.gz",
chr=str_chroms,
sample=wildcards.sample)
output:
"output/{sample}.g.vcf.gz"
shell:
"vcf-merge {input} | bgzip -c > {output}"
rule merge_vcfs
的 output
中有错别字。它应该是 outputs/{sample}.g.vcf.gz
(即 outputs
而不是 output
)
是的,正如@JeeYem 提到的,您在规则合并的输出文件中有错字。
我也没有看到规则合并中需要 lambda 吗?无论样本如何,您都在传递同一组染色体? str_chroms
独立于您设置中的示例,因此您可以将其重写为:
rule merge_vcfs:
input: expand("outputs/{{sample}}_{chr}.g.vcf.gz",chr=str_chroms)
output: "output/{sample}.g.vcf.gz"
shell: "vcf-merge {input} | bgzip -c > {output}"