Snakemake 和 Pandas 语法：从示例 table 获取示例特定参数

Question

首先，这可能是 Snakemake and pandas syntax 的副本。不过还是一头雾水，再解释一下。

在 Snakemake 中，我加载了一个包含多个列的示例 table。其中一列称为 'Read1'，它包含样本特定的读取长度。我想分别为每个样本获取这个值，因为它可能不同。

我期望的工作是这样的：

rule mismatch_profile:
    input:
        rseqc_input_bam
    output:
        os.path.join(rseqc_dir, '{sample}.mismatch_profile.xls')
    conda:
        "../envs/rseqc.yaml"  
    params:
        read_length = samples.loc['{sample}']['Read1']
    shell:
        '''
        #!/bin/bash
        mismatch_profile.py -i {input} -o {rseqc_dir}/{wildcards.sample} -l {params.read_length}

但是，这不起作用。出于某种原因，我不允许在标准 Pandas 语法中使用 {sample}，我收到此错误：

KeyError in line 41 of /rst1/2017-0205_illuminaseq/scratch/swo-406/test_snakemake_full/rules/rseqc.smk:
'the label [{sample}] is not in the [index]'

我不明白为什么这不起作用。我读到我也可以使用 lambda 函数，但我不太明白具体如何使用，因为它们仍然需要 {sample} 作为输入。

谁能帮帮我？

Answer 1

你可以使用 lambda 函数

params:
    read_length = lambda wildcards: samples.loc[wildcards.sample, 'Read1']

Snakemake 和 Pandas 语法：从示例 table 获取示例特定参数

Snakemake and Pandas syntax: Getting sample specific parameters from the sample table

bioinformatics

snakemake