使用 Snakemake 的分组样本每组单个输出文件
Single output file per group for grouped samples using Snakemake
我正在尝试编写一个 snakemake 管道来处理分成多个组的样本,每个组需要一个摘要文件。例如样本划分如下:
Group 1:
Sample 1
Sample 2
Group 2:
Sample 3
Sample 4
每个样本都使用 bedtools 进行处理,为每个样本生成一个输出文件。接下来,我需要对每个Group收集的样本进行分组层面的总结。缩写的 snakemake 文件如下所示:
rule intersect:
input:
bam = join('{group}','{sample}.bam'),
reg_bed = join('{group}', 'region.bed')
output:
reg_intersect = join('{group}', '{sample}.intersect.bed')
shell:
'bedtools intersect -wa -wb -a {input.reg_bed} -b {input.bam} > {output.reg_intersect}'
rule aggregate:
input:
rules.interesect.output
output:
join('{group}','summary_stats.csv')
shell:
touch(join('{group}','summary_stats.csv'))
#this would be a call to a python function that operates on all the output files to generate a single file
我收到关于通配符在输入和输出之间不一致的投诉(输入包含 {group} 和 {sample} 而输出只有 {group}。我试过使用 expand() 但我必须包括两者的值组和样本,样本是不可能的,因为它依赖于组。
欢迎提出任何建议。
正如@Maarten-vd-Sande 所说,最好的方法是使用输入函数获取 aggregate
规则中某个组的所有样本。
samplesInGroups = {"group1":["sample1","sample2"],"group2":["sample3","sample4"]}
def getSamplesInGroup(wildcards):
return [join(wildcards.group,s+".intersect.bed") for s in samplesInGroups[wildcards.group]]
rule all:
input: expand("{group}/summary_stats.csv",group=list(samplesInGroups.keys()))
rule intersect:
input:
bam = join('{group}','{sample}.bam'),
reg_bed = join('{group}', 'region.bed')
output:
reg_intersect = join('{group}', '{sample}.intersect.bed')
shell:
'bedtools intersect -wa -wb -a {input.reg_bed} -b {input.bam} > {output.reg_intersect}'
rule aggregate:
input:
getSamplesInGroup
output:
join('{group}','summary_stats.csv')
shell:
"python ... {input}"
我正在尝试编写一个 snakemake 管道来处理分成多个组的样本,每个组需要一个摘要文件。例如样本划分如下:
Group 1:
Sample 1
Sample 2
Group 2:
Sample 3
Sample 4
每个样本都使用 bedtools 进行处理,为每个样本生成一个输出文件。接下来,我需要对每个Group收集的样本进行分组层面的总结。缩写的 snakemake 文件如下所示:
rule intersect:
input:
bam = join('{group}','{sample}.bam'),
reg_bed = join('{group}', 'region.bed')
output:
reg_intersect = join('{group}', '{sample}.intersect.bed')
shell:
'bedtools intersect -wa -wb -a {input.reg_bed} -b {input.bam} > {output.reg_intersect}'
rule aggregate:
input:
rules.interesect.output
output:
join('{group}','summary_stats.csv')
shell:
touch(join('{group}','summary_stats.csv'))
#this would be a call to a python function that operates on all the output files to generate a single file
我收到关于通配符在输入和输出之间不一致的投诉(输入包含 {group} 和 {sample} 而输出只有 {group}。我试过使用 expand() 但我必须包括两者的值组和样本,样本是不可能的,因为它依赖于组。
欢迎提出任何建议。
正如@Maarten-vd-Sande 所说,最好的方法是使用输入函数获取 aggregate
规则中某个组的所有样本。
samplesInGroups = {"group1":["sample1","sample2"],"group2":["sample3","sample4"]}
def getSamplesInGroup(wildcards):
return [join(wildcards.group,s+".intersect.bed") for s in samplesInGroups[wildcards.group]]
rule all:
input: expand("{group}/summary_stats.csv",group=list(samplesInGroups.keys()))
rule intersect:
input:
bam = join('{group}','{sample}.bam'),
reg_bed = join('{group}', 'region.bed')
output:
reg_intersect = join('{group}', '{sample}.intersect.bed')
shell:
'bedtools intersect -wa -wb -a {input.reg_bed} -b {input.bam} > {output.reg_intersect}'
rule aggregate:
input:
getSamplesInGroup
output:
join('{group}','summary_stats.csv')
shell:
"python ... {input}"