Snakemake：如何在规则中获得 shell 命令运行不同的参数（整数）？

Question

我正在尝试为我的提升决策树训练研究最佳超参数。这是两个实例的代码：

user = '/home/.../BDT/'

nestimators = [1, 2]

rule all:
        input: user + 'AUC_score.pdf'

rule testing:
        output: user + 'AUC_score.csv'
        shell: 'python bdt.py --nestimators {}'.format(nestimators[i] for i in range(2))

rule plotting:
        input: user + 'AUC_score.csv'
        output: user + 'AUC_score.pdf'
        shell: 'python opti.py

方案如下：我想用一堆不同的超参数并行训练我的 BDT（一开始我只想从 nestimators 开始）。因此我尝试使用 shell 命令来训练 bdt。 bdt.py 获取训练参数，训练并将超参数+训练分数保存在csv文件中。在 csv 文件中，我可以查看哪些超参数给出了最好的分数。耶！

遗憾的是，它不是那样工作的。我尝试使用输入函数，但因为我想给出一个整数，所以它不起作用。我按照您在上面看到的方式进行了尝试，但知道我得到了 'error message' : 'python bdt.py --nestimators <generator object at 0x7f5981a9d150>'。我明白为什么这也不起作用，但我不知道从这里去哪里。

Answer 1

您的代码中的问题是表达式 nestimators[i] for i in range(2) 不是列表（如您所想）。那是一个生成器，在您明确执行之前它不会产生任何值。例如，这段代码：

'python bdt.py --nestimators {}'.format(list(nestimators[i] for i in range(2)))

产生结果'python bdt.py --nestimators [1, 2]'

实际上您根本不需要生成器，因为这段代码会产生完全相同的输出：

'python bdt.py --nestimators {}'.format(nestimators)

这种格式可能不是您的脚本所期望的格式。例如，如果你希望得到这样的命令行：python bdt.py --nestimators 1,2，你可以使用这个表达式：

'python bdt.py --nestimators {}'.format(",".join(map(str, nestimators)))

如果可以使用 f 字符串，最后一个表达式可以减少：

f'python bdt.py --nestimators {",".join(map(str, nestimators))}'

Answer 2

错误的产生是因为{}被generator对象替换了，也就是不是先被1替换，再被2 但是，可以这么说，通过 nestimators.

上的迭代器

即使您更正了规则 testing 中的 python 表达式。如果我正确理解您的目标，可能会有更根本的问题。 The workflows of snakemake are defined in terms of rules that define how to create output files from input files. 因此，功能测试将只调用一次，但可能您想为每个超参数单独调用规则。

解决方案是在输出的文件名中添加超参数。像这样：

user = '/home/.../BDT/'

nestimators = [1, 2]

rule all:
        input: user + 'AUC_score.pdf'

rule testing:
        output: user + 'AUC_score_{hyper}.csv'
        shell: 'python bdt.py --nestimators {wildcards.hyper}'

rule plotting:
        input: expand(user + 'AUC_score_{hyper}.csv', hyper=nestimators)
        output: user + 'AUC_score.pdf'
        shell: 'python opti.py'

最后，不用shell:调用一个python脚本。您可以直接使用 script: ，如文档中所述： https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts

Snakemake：如何在规则中获得 shell 命令运行不同的参数（整数）？

Snakemake: How do I get a shell command running with different arguments (integer) in a rule?

python

shell

integer

snakemake

Snakemake：如何在规则中获得 shell 命令 运行 不同的参数（整数）？

Snakemake: How do I get a shell command running with different arguments (integer) in a rule?

python

shell

integer

snakemake

Snakemake：如何在规则中获得 shell 命令运行不同的参数（整数）？