Snakemake：每次调用如何使用列表中的一个整数作为脚本的输入？

Question

我正在尝试练习在 snakemake 中编写工作流程。

我的 Snakefile 的内容：

configfile: "config.yaml"

rule get_col:
  input:
   expand("data/{file}.csv",file=config["datname"])
  output:
   expand("output/{file}_col{param}.csv",file=config["datname"],param=config["cols"])
  params:
   col=config["cols"]
  script:
   "scripts/getCols.R"

config.yaml的内容：

cols:
  [2,4]
datname:
  "GSE3790_expression_data"

我的 R 脚本：

getCols=function(input,output,col) {
  dat=read.csv(input)
  dat=dat[,col]
  write.csv(dat,output,row.names=F)
}

getCols(snakemake@input[[1]],snakemake@output[[1]],snakemake@params[['col']])

似乎同时调用了两列。 我想要完成的是从每个输出文件的列表中调用一列。

由于没有机会创建第二个输出（两列都用于创建第一个输出），snakemake 抛出错误：

Waiting at most 5 seconds for missing files.
MissingOutputException in line 3 of /Users/rebecca/Desktop/snakemake-tutorial/practice/Snakefile:
Job completed successfully, but some output files are missing.

稍微不相关的一点，我想我可以把输入写成： '“数据/{文件}.csv”' 但是 returns:

WildcardError in line 4 of /Users/rebecca/Desktop/snakemake-tutorial/practice/Snakefile:
Wildcards in input files cannot be determined from output files:
'file'

如有任何帮助，我们将不胜感激！

Answer 1

看起来你想运行你的 Rscript 每个文件两次，每个 col 的值一次。在这种情况下，规则也需要被调用两次。在我看来，这里 expand 的使用也有点过分了。 expand 用所有可能的值填充通配符，returns 生成结果文件的列表。因此，此规则的输出将是 files 和 cols 之间所有可能的组合，简单脚本无法在一个运行中创建这些组合。这也是无法从输出中推断出 file 的原因 - 它在那里被扩展。

相反，尝试更轻松地为一个文件和列编写规则，并在需要此输出作为输入的规则中扩展结果输出。如果您生成了工作流的最终输出，请将其作为输入 rule all 告诉工作流最终目标是什么。

rule all:
  input:
    expand("output/{file}_col{param}.csv",
    file=config["datname"], param=config["cols"])

rule get_col:
  input:
    "data/{file}.csv"
  output:
    "output/{file}_col{param}.csv"
  params:
    col=lambda wc: wc.param
  script:
    "scripts/getCols.R"

Snakemake 将从 rule all（或进一步使用输出的任何其他规则）推断需要做什么，并相应地调用 rule get_col。

Snakemake：每次调用如何使用列表中的一个整数作为脚本的输入？

Snakemake: how to use one integer from list each call as input to script?

r

python-3.x

snakemake