如何在 snakemake 运行 指令下传递函数
how to pass a function under snakemake run directive
我正在 snakemake 中构建一个工作流,并希望将其中一个规则回收到两个不同的输入源。输入源可以是 source1 或 source1+source2,并且根据输入,输出目录也会有所不同。由于在同一规则中执行此操作非常复杂,而且我不想创建完整规则的副本,因此我想创建两个具有不同 input/output 但 运行 相同命令的规则。
有可能使这项工作吗?我正确解析了 DAG,但作业没有在集群上完成 (ERROR : bamcov_cmd not defined
)..
下面的示例(两个规则最后使用相同的命令):
这是命令
def bamcov_cmd():
return( (deepTools_path+"bamCoverage " +
"-b {input.bam} " +
"-o {output} " +
"--binSize {params.bw_binsize} " +
"-p {threads} " +
"--normalizeTo1x {params.genome_size} " +
"{params.read_extension} " +
"&> {log}") )
这是规则
rule bamCoverage:
input:
bam = file1+"/{sample}.bam",
bai = file1+"/{sample}.bam.bai"
output:
"bamCoverage/{sample}.filter.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
bamcov_cmd()
这是可选规则2
rule bamCoverage2:
input:
bam = file2+"/{sample}.filter.bam",
bai = file2+"/{sample}.filter.bam.bai"
output:
"bamCoverage/{sample}.filter.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
bamcov_cmd()
你问的在python中是可能的。
这取决于文件中是否只有 python 代码,或者 python 和 Snakemake。
我会先回答这个问题,然后我会进行跟进回复,因为我希望您进行不同的设置,这样您就不必这样做了。
就Python:
from fileContainingMyBamCovCmdFunction import bamcov_cmd
rule bamCoverage:
...
run:
bamcov_cmd()
在视觉上,看看我在这个文件中是如何做的,以引用对 buildHeader 和 buildSample 的访问。这些文件被 Snakefile 调用。它应该对你有用。
https://github.com/LCR-BCCRC/workflow_exploration/blob/master/Snakemake/modules/py_buildFile/buildFile.py
编辑 2017-07-23 - 更新下面的代码段以反映用户评论
Snakemake 和 Python:
include: "fileContainingMyBamCovCmdFunction.suffix"
rule bamCoverage:
...
run:
shell(bamcov_cmd())
编辑结束
如果函数确实特定于 bamCoverage 调用,如果您愿意,可以将其放回规则中。这意味着它没有在其他地方被调用,这可能是真的。
使用“.”注释文件时要小心符号,我使用'_'因为我发现以这种方式防止创建循环依赖更容易。
此外,如果您最终将这两个规则分开,您很可能会遇到歧义错误。
http://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=ruleorder#handling-ambiguous-rules
如果可能,最好的做法是让规则生成唯一的输出。
至于替代方案,考虑像这样设置代码吗?
from subprocess import call
rule all:
input:
"path/to/file/mySample.bw"
#OR
#"path/to/file/mySample_filtered.bw"
bamCoverage:
input:
bam = file1+"/{sample}.bam",
bai = file1+"/{sample}.bam.bai"
output:
"bamCoverage/{sample}.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
callString= deepTools_path + "bamCoverage " \
+ "-b " + wilcards.input.bam \
+ "-o " + wilcards.output \
+ "--binSize " str(params.bw_binsize) \
+ "-p " + str({threads}) \
+ "--normalizeTo1x " + str(params.genome_size) \
+ " " + str(params.read_extension) \
+ "&> " + str(log)
call(callString, shell=True)
rule filterBam:
input:
"{pathFB}/{sample}.bam"
output:
"{pathFB}/{sample}_filtered.bam"
run:
callString="samtools view -bh -F 512 " + wildcards.input \
+ ' > ' + wildcards.output
call(callString, shell=True)
想法?
我正在 snakemake 中构建一个工作流,并希望将其中一个规则回收到两个不同的输入源。输入源可以是 source1 或 source1+source2,并且根据输入,输出目录也会有所不同。由于在同一规则中执行此操作非常复杂,而且我不想创建完整规则的副本,因此我想创建两个具有不同 input/output 但 运行 相同命令的规则。
有可能使这项工作吗?我正确解析了 DAG,但作业没有在集群上完成 (ERROR : bamcov_cmd not defined
)..
下面的示例(两个规则最后使用相同的命令):
这是命令
def bamcov_cmd():
return( (deepTools_path+"bamCoverage " +
"-b {input.bam} " +
"-o {output} " +
"--binSize {params.bw_binsize} " +
"-p {threads} " +
"--normalizeTo1x {params.genome_size} " +
"{params.read_extension} " +
"&> {log}") )
这是规则
rule bamCoverage:
input:
bam = file1+"/{sample}.bam",
bai = file1+"/{sample}.bam.bai"
output:
"bamCoverage/{sample}.filter.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
bamcov_cmd()
这是可选规则2
rule bamCoverage2:
input:
bam = file2+"/{sample}.filter.bam",
bai = file2+"/{sample}.filter.bam.bai"
output:
"bamCoverage/{sample}.filter.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
bamcov_cmd()
你问的在python中是可能的。 这取决于文件中是否只有 python 代码,或者 python 和 Snakemake。 我会先回答这个问题,然后我会进行跟进回复,因为我希望您进行不同的设置,这样您就不必这样做了。
就Python:
from fileContainingMyBamCovCmdFunction import bamcov_cmd
rule bamCoverage:
...
run:
bamcov_cmd()
在视觉上,看看我在这个文件中是如何做的,以引用对 buildHeader 和 buildSample 的访问。这些文件被 Snakefile 调用。它应该对你有用。 https://github.com/LCR-BCCRC/workflow_exploration/blob/master/Snakemake/modules/py_buildFile/buildFile.py
编辑 2017-07-23 - 更新下面的代码段以反映用户评论
Snakemake 和 Python:
include: "fileContainingMyBamCovCmdFunction.suffix"
rule bamCoverage:
...
run:
shell(bamcov_cmd())
编辑结束
如果函数确实特定于 bamCoverage 调用,如果您愿意,可以将其放回规则中。这意味着它没有在其他地方被调用,这可能是真的。 使用“.”注释文件时要小心符号,我使用'_'因为我发现以这种方式防止创建循环依赖更容易。 此外,如果您最终将这两个规则分开,您很可能会遇到歧义错误。 http://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=ruleorder#handling-ambiguous-rules 如果可能,最好的做法是让规则生成唯一的输出。
至于替代方案,考虑像这样设置代码吗?
from subprocess import call
rule all:
input:
"path/to/file/mySample.bw"
#OR
#"path/to/file/mySample_filtered.bw"
bamCoverage:
input:
bam = file1+"/{sample}.bam",
bai = file1+"/{sample}.bam.bai"
output:
"bamCoverage/{sample}.bw"
params:
bw_binsize = bw_binsize,
genome_size = int(genome_size),
read_extension = "--extendReads"
log:
"bamCoverage/logs/bamCoverage.{sample}.log"
benchmark:
"bamCoverage/.benchmark/bamCoverage.{sample}.benchmark"
threads: 16
run:
callString= deepTools_path + "bamCoverage " \
+ "-b " + wilcards.input.bam \
+ "-o " + wilcards.output \
+ "--binSize " str(params.bw_binsize) \
+ "-p " + str({threads}) \
+ "--normalizeTo1x " + str(params.genome_size) \
+ " " + str(params.read_extension) \
+ "&> " + str(log)
call(callString, shell=True)
rule filterBam:
input:
"{pathFB}/{sample}.bam"
output:
"{pathFB}/{sample}_filtered.bam"
run:
callString="samtools view -bh -F 512 " + wildcards.input \
+ ' > ' + wildcards.output
call(callString, shell=True)
想法?