lambda 通配符在 BWA MEM(或 BWA-MEM2)Snakemake 包装器中不起作用

lambda wildcards not working in BWA MEM (or BWA-MEM2) Snakemake wrapper(s)

我正在将 Snakemake Shell: 移植到 Snakemake Wrappers and have noticed that the lambda wildcards: I've successfully used for other wrappers is failing for the BWA MEM wrapper

我只设法让包装器工作,如果它是硬编码的,比如:

input:
        reads=["trimming/trimmomatic/{sample}.1.fastq", "trimming/trimmomatic/{sample}.2.fastq"]

不过,我更喜欢使用lambda wildcards: getTrims(wildcards.sample)[0]lambda wildcards: getTrims(wildcards.sample)[1];类似于 trimming 规则(如下)的输入。

最小示例

蛇形文件

# Directories------------------------------------------------------------------
configfile: "config.yaml"

# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))

import os
import pandas as pd
# getting the samples information (names, path to r1 & r2) from samples.txt
samples_information = pd.read_csv("samples.txt", sep='\t', index_col=False)
# get a list of the sample names
sample_names = list(samples_information['sample'])
sample_locations = list(samples_information['location'])
samples_dict = dict(zip(sample_names, sample_locations))
# get number of samples
len_samples = len(sample_names)


# Rules -----------------------------------------------------------------------

rule all:
    input:
        expand('{TRIM_DIR}/{TRIM_TOOL}/{sample}_{pair}_trim_{paired}.fq.gz', TRIM_DIR=dirs_dict["TRIM_DIR"], TRIM_TOOL=config["TRIM_TOOL"], sample=sample_names, pair=['R1', 'R2'], paired=['paired', 'unpaired']),
        expand('{ALIGN_DIR}/{ALIGN_TOOL}/{sample}.bam', ALIGN_DIR=dirs_dict['ALIGN_DIR'], ALIGN_TOOL=config['ALIGN_TOOL'], sample=sample_names),

def getHome(sample):
  return(list(os.path.join(samples_dict[sample],"{0}_{1}.fastq.gz".format(sample,pair)) for pair in ['R1','R2']))

rule trimming:
    input:
        r1 = lambda wildcards: getHome(wildcards.sample)[0],
        r2 = lambda wildcards: getHome(wildcards.sample)[1]
    output:
        r1 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_paired.fq.gz"),
        r1_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R1_trim_unpaired.fq.gz"),        
        r2 = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_paired.fq.gz"),
        r2_unpaired = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"],"{sample}_R2_trim_unpaired.fq.gz")
    log: os.path.join(dirs_dict["LOG_DIR"],config["TRIM_TOOL"],"{sample}.log")
    threads: 32
    params:
        # list of trimmers (see manual)
        trimmer=["MINLEN:36"],
        # optional parameters
        extra="",
        compression_level="-9"
    resources:
        mem = 1000,
        time = 120
    message: """--- Trimming FASTQ files with Trimmomatic."""
    wrapper:
        "0.64.0/bio/trimmomatic/pe"

trim_dir = os.path.join(dirs_dict["TRIM_DIR"],config["TRIM_TOOL"])
trims_locations = [trim_dir] * len_samples
trims_dict = dict(zip(sample_names, trims_locations))

def getTrims(sample):
  return(list(os.path.join(trims_dict[sample],"{0}_{1}_trim_paired.fq.gz".format(sample,pair)) for pair in ['R1','R2']))

rule alignment:
    input:
        reads=["trimming/trimmomatic/{sample}_R1_trim_paired.fq.gz", "trimming/trimmomatic/{sample}_R2_trim_paired.fq.gz"]
    output:
        os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
    log: os.path.join(dirs_dict["LOG_DIR"],config["ALIGN_TOOL"],"{sample}.log")
    message: """--- Alignment with BWA."""
    threads: 8
    resources:
        mem = 2500,
        time = 100
    params:
        index=os.path.join(dirs_dict["REF_DIR"], config["REF_GENOME"]),
        extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",
        sort="none",             
        sort_order="queryname",  
        sort_extra=""            
    wrapper:
        "0.64.0/bio/bwa/mem"

Config.yaml

# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"

# Tools
QC_TOOL: "fastQC"
TRIM_TOOL: "trimmomatic"
ALIGN_TOOL: "bwa"
MARKDUP_TOOL: "picard"
CALLING_TOOL: "varscan"
ANNOT_TOOL: "vep"

samples.txt

MTG325

这个有效:

(snakemake)$ snakemake -n -r 
Building DAG of jobs...
Job counts:
    count   jobs
    1   alignment
    1   all
    2

[Wed Sep  2 08:17:16 2020]
Job 2: --- Alignment with BWA.
Reason: Missing output files: alignment/bwa/MTG325.bam

[Wed Sep  2 08:17:16 2020]
localrule all:
    input: trimming/trimmomatic/MTG325_R1_trim_paired.fq.gz, trimming/trimmomatic/MTG325_R1_trim_unpaired.fq.gz, trimming/trimmomatic/MTG325_R2_trim_paired.fq.gz, trimming/trimmomatic/MTG325_R2_trim_unpaired.fq.gz, alignment/bwa/MTG325.bam
    jobid: 0
    reason: Input files updated by another job: alignment/bwa/MTG325.bam

Job counts:
    count   jobs
    1   alignment
    1   all
    2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

这不是:

rule alignment:
    input:
        reads=["lambda wildcards: getTrims(wildcards.sample)[0]", "lambda wildcards: getTrims(wildcards.sample)[1]"]
    output:
        os.path.join(dirs_dict["ALIGN_DIR"],config["ALIGN_TOOL"],"{sample}.bam")
    log: os.path.join(dirs_dict["LOG_DIR"],config["ALIGN_TOOL"],"{sample}.log")
    message: """--- Alignment with BWA."""
    threads: 8
    resources:
        mem = 2500,
        time = 100
    params:
        index=os.path.join(dirs_dict["REF_DIR"], config["REF_GENOME"]),
        extra=r"-R '@RG\tID:{sample}\tPL:ILLUMINA\tSM:{sample}'",
        sort="none",
        sort_order="queryname",
        sort_extra=""
    wrapper:
        "0.64.0/bio/bwa/mem"

结果:

(snakemake) [moldach@arc wrappers]$ snakemake -n -r
Building DAG of jobs...
MissingInputException in line 65 of /home/moldach/wrappers/Snakefile:
Missing input files for rule alignment:
lambda wildcards: getTrims(wildcards.sample)[1]
lambda wildcards: getTrims(wildcards.sample)[0]
input:
    reads=["lambda wildcards: getTrims(wildcards.sample)[0]", "lambda wildcards: getTrims(wildcards.sample)[1]"]

你在这里给出了一个字符串列表,所以 snakemake 实际上是在寻找 "lambda wildcards: getTrims(wildcards.sample)[0]" 作为输入文件,而不是将其视为输入函数。

您的 rule alingment 需要一个包含两个输入读取文件的列表,这应该适合您的 getTrims(sample) 函数的输出。

你试过了吗:

input:
    reads=lambda wildcards: getTrims(wildcards.sample)

稍后将 R1R2 放回同一个列表时,无需将它们分开。