了解 qsub 中的 -t 选项

Question

关于 -t 选项在使用 qsub

提交作业时到底做了什么，文档有点不清楚

http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm

来自文档：

-t Specifies the task ids of a job array. Single task arrays are allowed. The array_request argument is an integer id or a range of integers. Multiple ids or id ranges can be combined in a comma delimited list. Examples: -t 1-100 or -t 1,10,50-100

这是一个出错的例子，我请求了 2 个节点，每个节点 8 个进程，以及 16 个作业的数组。我曾希望将其自然分布在 2 个节点上，但 16 个任务是临时分布在 2 个以上的节点上的。

$ echo 'hostname' | qsub -q gpu -l nodes=2:ppn=8 -t 1-16
52727[]
$ cat STDIN.o52727-* | sort
gpu-3.local
gpu-3.local
gpu-3.local
gpu-3.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-5.local
gpu-7.local
gpu-7.local
gpu-7.local
gpu-7.local
gpu-7.local
gpu-7.local

Answer 1

我怀疑这不会完全回答你的问题，但你希望完成什么还不清楚。

使用 qsub -t 指定数组只会创建单独的作业，所有作业都具有相同的主 ID。按照您指定的方式提交将创建 16 个作业，每个作业总共需要 16 个核心。这种语法只是让一次提交大量作业变得更容易，而无需编写提交循环脚本。

单独使用 Torque（即忽略调度程序），您可以通过这样说来将作业强制到特定节点：

qsub -l nodes=gpu-node01:ppn=8+gpu-node02:ppn=8

更高级的调度程序可以为您提供更大的灵活性（例如，Moab 或 Maui 允许“-l nodes=2:ppn=8,nallocpolicy=exactnode”，在调度时将 NODEALLOCATIONPOLICY EXACTNODE 应用于作业，并且会给你 8 个核心，每个节点恰好在两个节点上（在这种情况下是任意两个节点））。

了解 qsub 中的 -t 选项

Understanding the -t option in qsub

qsub

pbs

torque