snakemake：根据样本名称或其他输入定义参数

Question

提前感谢您在此提供的所有帮助！我有一个 snakemake 文件，它定义了处理短读数据、映射和变体调用的步骤。我希望对不同的样本使用不同的参考序列，我想知道您会如何建议根据输入样本名称定义参考？

例如，我使用通配符定义了我的运行和示例名称。我希望根据样本（或运行）名称定义我的 ref，以便将样本映射到正确的参考。我的规则 map_reads 如下。

提前感谢您的帮助！

# Define samples: 
RUNS, SAMPLES = glob_wildcards("/xyz/{run}/{samp}_L001_R1_001.fastq.gz")
sample_dict  = dict(zip(SAMPLES,RUNS))
print("runs are: ", RUNS)
print("samples are: ", SAMPLES)

# Map reads.
rule map_reads:
  input:
    ref_path='/xyz/refs/{ref}.fasta',
    kr1='process/trim/{run}_{samp}_trim_kr_1.fq.gz',
    kr2='process/trim/{run}_{samp}_trim_kr_2.fq.gz'
  output:
    bam='process/bams/{run}_{samp}_{mapper}_{ref}_rg_sorted.bam'
  params:
    mapper='{mapper}'
  log:
    'process/bams/{run}_{samp}_{mapper}_{ref}_map.log'
  threads: 8
  shell:
    "/xyz/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"

Answer 1

您可以创建一个将样本和参考基因组相关联的文件，然后将其读入字典（或 pandas 数据框）。

然后可以在输入中访问 dictionary/dataframe 以确定给定样本的正确参考。

这是一个字典示例。

给定一个制表符分隔文件 samples.txt 将示例与参考相关联，如下所示：

sample_A   ref_A
sample_B   ref_B
sample_C   ref_C

然后，使用 lambda 函数，我们可以访问输入中的通配符对象，并使用 samp 通配符在我们的字典中找到相应的引用。

# Define samples: 
RUNS, SAMPLES = glob_wildcards("/xyz/{run}/{samp}_L001_R1_001.fastq.gz")
sample_dict  = dict(zip(SAMPLES,RUNS))
print("runs are: ", RUNS)
print("samples are: ", SAMPLES)

# Read samples.txt into dictionary.
sample_to_ref = {}
with open("samples.txt") as f:
    for line in f:
        line = line.strip().split("\t")
        sample_to_ref[line[0]] = line[1]  # sample_to_ref[sample] = reference

# Map reads.
rule map_reads:
  input:
    ref_path= lambda wildcards: expand('/xyz/refs/{ref}.fasta', ref=sample_to_ref[wildcards.samp]),  # lambda allows access to wildcards, to then access dictionary.
    kr1='process/trim/{run}_{samp}_trim_kr_1.fq.gz',
    kr2='process/trim/{run}_{samp}_trim_kr_2.fq.gz'
  output:
    bam='process/bams/{run}_{samp}_{mapper}_{ref}_rg_sorted.bam'
  params:
    mapper='{mapper}'
  log:
    'process/bams/{run}_{samp}_{mapper}_{ref}_map.log'
  threads: 8
  shell:
    "/xyz/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"

snakemake：根据样本名称或其他输入定义参数

snakemake: define parameter based on sample name or other input

snakemake