lambda 通配符在 BWA MEM(或 BWA-MEM2)Snakemake 包装器中不起作用
lambda wildcards not working in BWA MEM (or BWA-MEM2) Snakemake wrapper(s)
我正在将 Snakemake
Shell:
移植到 Snakemake Wrappers and have noticed that the lambda wildcards:
I've successfully used for other wrappers is failing for the BWA MEM wrapper。
我只设法让包装器工作,如果它是硬编码的,比如:
input:
reads=["trimming/trimmomatic/{sample}.1.fastq", "trimming/trimmomatic/{sample}.2.fastq"]
不过,我更喜欢使用lambda wildcards: getTrims(wildcards.sample)[0]
和lambda wildcards: getTrims(wildcards.sample)[1]
;类似于 trimming
规则(如下)的输入。
最小示例
蛇形文件
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Rules -----------------------------------------------------------------------
rule all:
input:
expand('{TRIM_DIR}/{TRIM_TOOL}/{sample}_{pair}_trim_{paired}.fq.gz', TRIM_DIR=dirs_dict["TRIM_DIR"], TRIM_TOOL=config["TRIM_TOOL"], sample=sample_names, pair=['R1', 'R2'], paired=['paired', 'unpaired']),
expand('{ALIGN_DIR}/{ALIGN_TOOL}/{sample}.bam', ALIGN_DIR=dirs_dict['ALIGN_DIR'], ALIGN_TOOL=config['ALIGN_TOOL'], sample=sample_names),
def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule trimming:
input:
r1 = lambda wildcards: getHome(wildcards.sample)[0],
r2 = lambda wildcards: getHome(wildcards.sample)[1]
output:
r1 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_paired.fq.gz"),
r1_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_unpaired.fq.gz"),
r2 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_paired.fq.gz"),
r2_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_unpaired.fq.gz")
log: os.path.join(dirs_dict["LOG_DIR"],config["TRIM_TOOL"],"{sample}.log")
threads: 32
params:
# list of trimmers (see manual)
trimmer=["MINLEN:36"],
# optional parameters
extra="",
compression_level="-9"
resources:
mem = 1000,
time = 120
message: """--- Trimming FASTQ files with Trimmomatic."""
wrapper:
"0.64.0/bio/trimmomatic/pe"
trim_dir = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"])
trims_locations = [trim_dir] * len_samples
trims_dict = dict(zip(sample_names, trims_locations))
def getTrims(sample):
return(list(os.path.join(trims_dict[sample],"{0}_{1}_trim_paired.fq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule alignment:
input:
reads=["trimming/trimmomatic/{sample}_R1_trim_paired.fq.gz", "trimming/trimmomatic/{sample}_R2_trim_paired.fq.gz"]
output:
os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
log: os.path.join(dirs_dict["LOG_DIR"],config["ALIGN_TOOL"],"{sample}.log")
message: """--- Alignment with BWA."""
threads: 8
resources:
mem = 2500,
time = 100
params:
index=os.path.join(dirs_dict["REF_DIR"], config["REF_GENOME"]),
extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",
sort="none",
sort_order="queryname",
sort_extra=""
wrapper:
"0.64.0/bio/bwa/mem"
Config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"
samples.txt
MTG325
这个有效:
(snakemake)$ snakemake -n -r
Building DAG of jobs...
Job counts:
count jobs
1 alignment
1 all
2
[Wed Sep 2 08:17:16 2020]
Job 2: --- Alignment with BWA.
Reason: Missing output files: alignment/bwa/MTG325.bam
[Wed Sep 2 08:17:16 2020]
localrule all:
input: trimming/trimmomatic/MTG325_R1_trim_paired.fq.gz, trimming/trimmomatic/MTG325_R1_trim_unpaired.fq.gz, trimming/trimmomatic/MTG325_R2_trim_paired.fq.gz, trimming/trimmomatic/MTG325_R2_trim_unpaired.fq.gz, alignment/bwa/MTG325.bam
jobid: 0
reason: Input files updated by another job: alignment/bwa/MTG325.bam
Job counts:
count jobs
1 alignment
1 all
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
这不是:
rule alignment:
input:
reads=["lambda wildcards: getTrims(wildcards.sample)[0]", "lambda wildcards: getTrims(wildcards.sample)[1]"]
output:
os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
log: os.path.join(dirs_dict["LOG_DIR"],config["ALIGN_TOOL"],"{sample}.log")
message: """--- Alignment with BWA."""
threads: 8
resources:
mem = 2500,
time = 100
params:
index=os.path.join(dirs_dict["REF_DIR"], config["REF_GENOME"]),
extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",
sort="none",
sort_order="queryname",
sort_extra=""
wrapper:
"0.64.0/bio/bwa/mem"
结果:
(snakemake) [moldach@arc wrappers]$ snakemake -n -r
Building DAG of jobs...
MissingInputException in line 65 of /home/moldach/wrappers/Snakefile:
Missing input files for rule alignment:
lambda wildcards: getTrims(wildcards.sample)[1]
lambda wildcards: getTrims(wildcards.sample)[0]
input:
reads=["lambda wildcards: getTrims(wildcards.sample)[0]", "lambda wildcards: getTrims(wildcards.sample)[1]"]
你在这里给出了一个字符串列表,所以 snakemake 实际上是在寻找 "lambda wildcards: getTrims(wildcards.sample)[0]"
作为输入文件,而不是将其视为输入函数。
您的 rule alingment
需要一个包含两个输入读取文件的列表,这应该适合您的 getTrims(sample)
函数的输出。
你试过了吗:
input:
reads=lambda wildcards: getTrims(wildcards.sample)
稍后将 R1
和 R2
放回同一个列表时,无需将它们分开。
我正在将 Snakemake
Shell:
移植到 Snakemake Wrappers and have noticed that the lambda wildcards:
I've successfully used for other wrappers is failing for the BWA MEM wrapper。
我只设法让包装器工作,如果它是硬编码的,比如:
input:
reads=["trimming/trimmomatic/{sample}.1.fastq", "trimming/trimmomatic/{sample}.2.fastq"]
不过,我更喜欢使用lambda wildcards: getTrims(wildcards.sample)[0]
和lambda wildcards: getTrims(wildcards.sample)[1]
;类似于 trimming
规则(如下)的输入。
最小示例
蛇形文件
# Directories------------------------------------------------------------------
configfile: "config.yaml"
# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))
import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)
# Rules -----------------------------------------------------------------------
rule all:
input:
expand('{TRIM_DIR}/{TRIM_TOOL}/{sample}_{pair}_trim_{paired}.fq.gz', TRIM_DIR=dirs_dict["TRIM_DIR"], TRIM_TOOL=config["TRIM_TOOL"], sample=sample_names, pair=['R1', 'R2'], paired=['paired', 'unpaired']),
expand('{ALIGN_DIR}/{ALIGN_TOOL}/{sample}.bam', ALIGN_DIR=dirs_dict['ALIGN_DIR'], ALIGN_TOOL=config['ALIGN_TOOL'], sample=sample_names),
def getHome(sample):
return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule trimming:
input:
r1 = lambda wildcards: getHome(wildcards.sample)[0],
r2 = lambda wildcards: getHome(wildcards.sample)[1]
output:
r1 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_paired.fq.gz"),
r1_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_unpaired.fq.gz"),
r2 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_paired.fq.gz"),
r2_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_unpaired.fq.gz")
log: os.path.join(dirs_dict["LOG_DIR"],config["TRIM_TOOL"],"{sample}.log")
threads: 32
params:
# list of trimmers (see manual)
trimmer=["MINLEN:36"],
# optional parameters
extra="",
compression_level="-9"
resources:
mem = 1000,
time = 120
message: """--- Trimming FASTQ files with Trimmomatic."""
wrapper:
"0.64.0/bio/trimmomatic/pe"
trim_dir = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"])
trims_locations = [trim_dir] * len_samples
trims_dict = dict(zip(sample_names, trims_locations))
def getTrims(sample):
return(list(os.path.join(trims_dict[sample],"{0}_{1}_trim_paired.fq.gz".format(sample,pair)) for pair in ['R1','R2']))
rule alignment:
input:
reads=["trimming/trimmomatic/{sample}_R1_trim_paired.fq.gz", "trimming/trimmomatic/{sample}_R2_trim_paired.fq.gz"]
output:
os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
log: os.path.join(dirs_dict["LOG_DIR"],config["ALIGN_TOOL"],"{sample}.log")
message: """--- Alignment with BWA."""
threads: 8
resources:
mem = 2500,
time = 100
params:
index=os.path.join(dirs_dict["REF_DIR"], config["REF_GENOME"]),
extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",
sort="none",
sort_order="queryname",
sort_extra=""
wrapper:
"0.64.0/bio/bwa/mem"
Config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"
samples.txt
MTG325
这个有效:
(snakemake)$ snakemake -n -r
Building DAG of jobs...
Job counts:
count jobs
1 alignment
1 all
2
[Wed Sep 2 08:17:16 2020]
Job 2: --- Alignment with BWA.
Reason: Missing output files: alignment/bwa/MTG325.bam
[Wed Sep 2 08:17:16 2020]
localrule all:
input: trimming/trimmomatic/MTG325_R1_trim_paired.fq.gz, trimming/trimmomatic/MTG325_R1_trim_unpaired.fq.gz, trimming/trimmomatic/MTG325_R2_trim_paired.fq.gz, trimming/trimmomatic/MTG325_R2_trim_unpaired.fq.gz, alignment/bwa/MTG325.bam
jobid: 0
reason: Input files updated by another job: alignment/bwa/MTG325.bam
Job counts:
count jobs
1 alignment
1 all
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
这不是:
rule alignment:
input:
reads=["lambda wildcards: getTrims(wildcards.sample)[0]", "lambda wildcards: getTrims(wildcards.sample)[1]"]
output:
os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
log: os.path.join(dirs_dict["LOG_DIR"],config["ALIGN_TOOL"],"{sample}.log")
message: """--- Alignment with BWA."""
threads: 8
resources:
mem = 2500,
time = 100
params:
index=os.path.join(dirs_dict["REF_DIR"], config["REF_GENOME"]),
extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",
sort="none",
sort_order="queryname",
sort_extra=""
wrapper:
"0.64.0/bio/bwa/mem"
结果:
(snakemake) [moldach@arc wrappers]$ snakemake -n -r
Building DAG of jobs...
MissingInputException in line 65 of /home/moldach/wrappers/Snakefile:
Missing input files for rule alignment:
lambda wildcards: getTrims(wildcards.sample)[1]
lambda wildcards: getTrims(wildcards.sample)[0]
input:
reads=["lambda wildcards: getTrims(wildcards.sample)[0]", "lambda wildcards: getTrims(wildcards.sample)[1]"]
你在这里给出了一个字符串列表,所以 snakemake 实际上是在寻找 "lambda wildcards: getTrims(wildcards.sample)[0]"
作为输入文件,而不是将其视为输入函数。
您的 rule alingment
需要一个包含两个输入读取文件的列表,这应该适合您的 getTrims(sample)
函数的输出。
你试过了吗:
input:
reads=lambda wildcards: getTrims(wildcards.sample)
稍后将 R1
和 R2
放回同一个列表时,无需将它们分开。