snakemake:根据样本名称或其他输入定义参数
snakemake: define parameter based on sample name or other input
提前感谢您在此提供的所有帮助!
我有一个 snakemake 文件,它定义了处理短读数据、映射和变体调用的步骤。我希望对不同的样本使用不同的参考序列,我想知道您会如何建议根据输入样本名称定义参考?
例如,我使用通配符定义了我的 运行 和示例名称。我希望根据样本(或 运行)名称定义我的 ref,以便将样本映射到正确的参考。我的规则 map_reads 如下。
提前感谢您的帮助!
# Define samples:
RUNS, SAMPLES = glob_wildcards("/xyz/{run}/{samp}_L001_R1_001.fastq.gz")
sample_dict = dict(zip(SAMPLES,RUNS))
print("runs are: ", RUNS)
print("samples are: ", SAMPLES)
# Map reads.
rule map_reads:
input:
ref_path='/xyz/refs/{ref}.fasta',
kr1='process/trim/{run}_{samp}_trim_kr_1.fq.gz',
kr2='process/trim/{run}_{samp}_trim_kr_2.fq.gz'
output:
bam='process/bams/{run}_{samp}_{mapper}_{ref}_rg_sorted.bam'
params:
mapper='{mapper}'
log:
'process/bams/{run}_{samp}_{mapper}_{ref}_map.log'
threads: 8
shell:
"/xyz/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"
您可以创建一个将样本和参考基因组相关联的文件,然后将其读入字典(或 pandas 数据框)。
然后可以在输入中访问 dictionary/dataframe 以确定给定样本的正确参考。
这是一个字典示例。
给定一个制表符分隔文件 samples.txt 将示例与参考相关联,如下所示:
sample_A ref_A
sample_B ref_B
sample_C ref_C
然后,使用 lambda 函数,我们可以访问输入中的通配符对象,并使用 samp 通配符在我们的字典中找到相应的引用。
# Define samples:
RUNS, SAMPLES = glob_wildcards("/xyz/{run}/{samp}_L001_R1_001.fastq.gz")
sample_dict = dict(zip(SAMPLES,RUNS))
print("runs are: ", RUNS)
print("samples are: ", SAMPLES)
# Read samples.txt into dictionary.
sample_to_ref = {}
with open("samples.txt") as f:
for line in f:
line = line.strip().split("\t")
sample_to_ref[line[0]] = line[1] # sample_to_ref[sample] = reference
# Map reads.
rule map_reads:
input:
ref_path= lambda wildcards: expand('/xyz/refs/{ref}.fasta', ref=sample_to_ref[wildcards.samp]), # lambda allows access to wildcards, to then access dictionary.
kr1='process/trim/{run}_{samp}_trim_kr_1.fq.gz',
kr2='process/trim/{run}_{samp}_trim_kr_2.fq.gz'
output:
bam='process/bams/{run}_{samp}_{mapper}_{ref}_rg_sorted.bam'
params:
mapper='{mapper}'
log:
'process/bams/{run}_{samp}_{mapper}_{ref}_map.log'
threads: 8
shell:
"/xyz/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"
提前感谢您在此提供的所有帮助! 我有一个 snakemake 文件,它定义了处理短读数据、映射和变体调用的步骤。我希望对不同的样本使用不同的参考序列,我想知道您会如何建议根据输入样本名称定义参考?
例如,我使用通配符定义了我的 运行 和示例名称。我希望根据样本(或 运行)名称定义我的 ref,以便将样本映射到正确的参考。我的规则 map_reads 如下。
提前感谢您的帮助!
# Define samples:
RUNS, SAMPLES = glob_wildcards("/xyz/{run}/{samp}_L001_R1_001.fastq.gz")
sample_dict = dict(zip(SAMPLES,RUNS))
print("runs are: ", RUNS)
print("samples are: ", SAMPLES)
# Map reads.
rule map_reads:
input:
ref_path='/xyz/refs/{ref}.fasta',
kr1='process/trim/{run}_{samp}_trim_kr_1.fq.gz',
kr2='process/trim/{run}_{samp}_trim_kr_2.fq.gz'
output:
bam='process/bams/{run}_{samp}_{mapper}_{ref}_rg_sorted.bam'
params:
mapper='{mapper}'
log:
'process/bams/{run}_{samp}_{mapper}_{ref}_map.log'
threads: 8
shell:
"/xyz/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"
您可以创建一个将样本和参考基因组相关联的文件,然后将其读入字典(或 pandas 数据框)。
然后可以在输入中访问 dictionary/dataframe 以确定给定样本的正确参考。
这是一个字典示例。
给定一个制表符分隔文件 samples.txt 将示例与参考相关联,如下所示:
sample_A ref_A
sample_B ref_B
sample_C ref_C
然后,使用 lambda 函数,我们可以访问输入中的通配符对象,并使用 samp 通配符在我们的字典中找到相应的引用。
# Define samples:
RUNS, SAMPLES = glob_wildcards("/xyz/{run}/{samp}_L001_R1_001.fastq.gz")
sample_dict = dict(zip(SAMPLES,RUNS))
print("runs are: ", RUNS)
print("samples are: ", SAMPLES)
# Read samples.txt into dictionary.
sample_to_ref = {}
with open("samples.txt") as f:
for line in f:
line = line.strip().split("\t")
sample_to_ref[line[0]] = line[1] # sample_to_ref[sample] = reference
# Map reads.
rule map_reads:
input:
ref_path= lambda wildcards: expand('/xyz/refs/{ref}.fasta', ref=sample_to_ref[wildcards.samp]), # lambda allows access to wildcards, to then access dictionary.
kr1='process/trim/{run}_{samp}_trim_kr_1.fq.gz',
kr2='process/trim/{run}_{samp}_trim_kr_2.fq.gz'
output:
bam='process/bams/{run}_{samp}_{mapper}_{ref}_rg_sorted.bam'
params:
mapper='{mapper}'
log:
'process/bams/{run}_{samp}_{mapper}_{ref}_map.log'
threads: 8
shell:
"/xyz/scripts/map_reads.sh {input.ref_path} {params.mapper} {input.kr1} {input.kr2} {output.bam} &>> {log}"