执行检查点中间命令
Executing checkpoint intermediate Commands
我目前运行正在处理检查点所需的 snakemake 运行中间规则的一些问题。在尝试解决此问题后,我认为问题出在 aggregate_input
函数中的扩展命令中,但无法弄清楚为什么它会这样运行。
建模的来自 snakemake 的当前检查点文档
rule all:
input:
¦ expand("string_tie_assembly/{sample}.gtf", sample=sample),
¦ expand("combined_fasta/{sample}.fa", sample=sample),
¦ "aggregated_fasta/all_fastas_combined.fa"
checkpoint clustering:
input:
¦ "string_tie_assembly_merged/merged_{sample}.gtf"
output:
¦ clusters = directory("split_gtf_file/{sample}")
shell:
¦ """
¦ mkdir -p split_gtf_file/{wildcards.sample} ;
collapse_gtf_file.py -gtf {input} -o split_gtf_file/{wildcards.sample}/{wildcards.sample}
¦ """
rule gtf_to_fasta:
input:
¦ "split_gtf_file/{sample}/{sample}_{i}.gtf"
output:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
shell:
¦ "gffread -w {output} -g {reference} {input}"
rule rename_fasta_files:
input:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
output:
¦ "lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa"
shell:
¦ "seqtk rename {input} {wildcards.sample}_{i} > {output}"
#Gather N number of output files from the GTF split
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
#Aggregate fasta from split GTF files together
rule combine_fasta_file:
input:
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
"cat {input} > {output}"
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
¦ "cat {input} > {output}"
#Aggegate aggregated fasta files
def gather_files(wildcards):
files = expand("combined_fasta/{sample}.fa", sample=sample)
return(files)
rule aggregate_fasta_files:
input:
¦ gather_files
output:
¦ "aggregated_fasta/all_fastas_combined.fa"
shell:
¦ "cat {input} > {output}"
我一直运行关注的问题是,在 运行 关注这个 snakemake 文件时,combine_fasta_file
规则没有 运行。花了更多时间解决这个错误后,我意识到问题是 aggregate_input
函数没有扩展,returns 是一个空列表 []
而不是我期望的是所有文件的列表在目录中展开,即:lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa
.
这很奇怪,尤其是考虑到 checkpoint clustering
正确执行 运行,并且下游输出文件位于 rule all
有人知道为什么会这样吗?或者可能是这种情况的原因。
用于运行 snakemake的命令:snakemake -rs Assemble_regions.snake --configfile snake_config_files/annotated_group_config.yaml
刚刚弄明白了。问题是我的 aggregat
e 命令针对错误的文件。以前我把它写成
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
然而,这个问题是针对错误的文件。而不是 globbig {i}.fa
,它应该是从 checkpoint clustering
生成的文件。所以将此代码更改为
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
print(checkpoint_output)
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=wildcards.sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{sample}_{i}.gtf")).i)
print(x)
return x
问题已解决。
我目前运行正在处理检查点所需的 snakemake 运行中间规则的一些问题。在尝试解决此问题后,我认为问题出在 aggregate_input
函数中的扩展命令中,但无法弄清楚为什么它会这样运行。
rule all:
input:
¦ expand("string_tie_assembly/{sample}.gtf", sample=sample),
¦ expand("combined_fasta/{sample}.fa", sample=sample),
¦ "aggregated_fasta/all_fastas_combined.fa"
checkpoint clustering:
input:
¦ "string_tie_assembly_merged/merged_{sample}.gtf"
output:
¦ clusters = directory("split_gtf_file/{sample}")
shell:
¦ """
¦ mkdir -p split_gtf_file/{wildcards.sample} ;
collapse_gtf_file.py -gtf {input} -o split_gtf_file/{wildcards.sample}/{wildcards.sample}
¦ """
rule gtf_to_fasta:
input:
¦ "split_gtf_file/{sample}/{sample}_{i}.gtf"
output:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
shell:
¦ "gffread -w {output} -g {reference} {input}"
rule rename_fasta_files:
input:
¦ "lncRNA_fasta/{sample}/canidate_{sample}_{i}.fa"
output:
¦ "lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa"
shell:
¦ "seqtk rename {input} {wildcards.sample}_{i} > {output}"
#Gather N number of output files from the GTF split
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
#Aggregate fasta from split GTF files together
rule combine_fasta_file:
input:
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
"cat {input} > {output}"
¦ aggregate_input
output:
¦ "combined_fasta/{sample}.fa"
shell:
¦ "cat {input} > {output}"
#Aggegate aggregated fasta files
def gather_files(wildcards):
files = expand("combined_fasta/{sample}.fa", sample=sample)
return(files)
rule aggregate_fasta_files:
input:
¦ gather_files
output:
¦ "aggregated_fasta/all_fastas_combined.fa"
shell:
¦ "cat {input} > {output}"
我一直运行关注的问题是,在 运行 关注这个 snakemake 文件时,combine_fasta_file
规则没有 运行。花了更多时间解决这个错误后,我意识到问题是 aggregate_input
函数没有扩展,returns 是一个空列表 []
而不是我期望的是所有文件的列表在目录中展开,即:lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa
.
这很奇怪,尤其是考虑到 checkpoint clustering
正确执行 运行,并且下游输出文件位于 rule all
有人知道为什么会这样吗?或者可能是这种情况的原因。
用于运行 snakemake的命令:snakemake -rs Assemble_regions.snake --configfile snake_config_files/annotated_group_config.yaml
刚刚弄明白了。问题是我的 aggregat
e 命令针对错误的文件。以前我把它写成
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fa")).i)
print(x)
return x
然而,这个问题是针对错误的文件。而不是 globbig {i}.fa
,它应该是从 checkpoint clustering
生成的文件。所以将此代码更改为
def aggregate_input(wildcards):
checkpoint_output = checkpoints.clustering.get(**wildcards).output[0]
print(checkpoint_output)
x = expand("lncRNA_fasta_renamed/{sample}/{sample}_{i}.fa",
¦ sample=wildcards.sample,
¦ i=glob_wildcards(os.path.join(checkpoint_output, "{sample}_{i}.gtf")).i)
print(x)
return x
问题已解决。