"Max jobs to run" 不等于在远程服务器上使用 GNU Parallel 时指定的作业数?
"Max jobs to run" does not equal the number of jobs specified when using GNU Parallel on remote server?
我正在尝试 运行 在 PBS 集群上使用 GNU Parallel 进行许多小型串行作业,每个计算节点都有 16 个核心,因为我打算使用多个计算节点,因此我通过了选项 -S $SERVERNAME到 GNUParallel,然而让我感到困惑的是,使用 -S $SERVERNAME
在节点上启动的作业数量不等于我在打算生成超过 9 个作业时指定的作业数量,以下是我的观察结果:
[fchen14@shelob001 ~]$ parallel --version
GNU parallel 20160922
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
[fchen14@shelob001 ~]$ hostname # this shows my hostname
shelob001
当使用 GNUParallel 作为没有 -S $SERVERNAME 的本地主机时,没有问题,我打算生成 10 个作业,而 GNUParallel 启动了 10 个作业:
[fchen14@shelob001 ~]$ parallel --progress echo ::: `seq 1 10`
Computers / CPU cores / Max jobs to run
1:local / 16 / 10 # 10 jobs spawned, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:10/0/100%/0.0s 1
local:9/1/100%/0.0s 2
local:8/2/100%/0.0s 3
local:7/3/100%/0.0s 4
local:6/4/100%/0.0s 5
local:5/5/100%/0.0s 6
local:4/6/100%/0.0s 7
local:3/7/100%/0.0s 8
local:2/8/100%/0.0s 9
local:1/9/100%/0.0s 10
local:0/10/100%/0.0s
当我使用 GNUParallel 使用 -S $SERVERNAME
生成少于 10 个作业时,仍然没问题。
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 1`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 1 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:1/0/100%/0.0s 1
shelob001:0/1/100%/1.0s
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 8`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 8 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:8/0/100%/0.0s 1
shelob001:7/1/100%/1.0s 7
shelob001:6/2/100%/0.5s 3
shelob001:5/3/100%/0.3s 8
shelob001:4/4/100%/0.2s 5
shelob001:3/5/100%/0.2s 2
shelob001:2/6/100%/0.2s 6
shelob001:1/7/100%/0.1s 4
shelob001:0/8/100%/0.1s
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 9`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 1
shelob001:8/1/100%/1.0s 5
shelob001:7/2/100%/0.5s 8
shelob001:6/3/100%/0.3s 2
shelob001:5/4/100%/0.2s 6
shelob001:4/5/100%/0.2s 9
shelob001:3/6/100%/0.2s 3
shelob001:2/7/100%/0.1s 4
shelob001:1/8/100%/0.1s 7
shelob001:0/9/100%/0.1s
让我感到困惑的是,当我尝试使用大于等于 10 的作业数时,生成的作业数总是比想要的少一个,这里我想生成 10 个,只启动了 9 个作业:
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 10` # I want to start 10 jobs
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 #why here "Max jobs to run" is 9?
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 2
shelob001:9/1/100%/3.0s 1
shelob001:8/2/100%/1.5s 7
shelob001:7/3/100%/1.0s 4
shelob001:6/4/100%/0.8s 9
shelob001:5/5/100%/0.6s 8
shelob001:4/6/100%/0.5s 3
shelob001:3/7/100%/0.4s 5
shelob001:2/8/100%/0.4s 6
shelob001:1/9/100%/0.4s 10
shelob001:0/10/100%/0.4s
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 11`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 10 # it seems the jobs started is one less than I specified
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:10/0/100%/0.0s 1
shelob001:10/1/100%/3.0s 2
shelob001:9/2/100%/1.5s 8
shelob001:8/3/100%/1.0s 3
shelob001:7/4/100%/0.8s 4
shelob001:6/5/100%/0.6s 5
shelob001:5/6/100%/0.5s 7
shelob001:4/7/100%/0.4s 10
shelob001:3/8/100%/0.4s 9
shelob001:2/9/100%/0.3s 6
shelob001:1/10/100%/0.4s 11
shelob001:0/11/100%/0.4s
[fchen14@shelob001 ~]$
我使用 "top" 检查了计算节点的状态,它确实显示当我使用 seq 1 10
时只使用了 9 个 Cpus。希望我已经把我的问题说清楚了,谁能指出这个问题的可能原因?欢迎任何建议。
非常感谢!
您似乎发现了一个错误。解决方法:-j+1
我正在尝试 运行 在 PBS 集群上使用 GNU Parallel 进行许多小型串行作业,每个计算节点都有 16 个核心,因为我打算使用多个计算节点,因此我通过了选项 -S $SERVERNAME到 GNUParallel,然而让我感到困惑的是,使用 -S $SERVERNAME
在节点上启动的作业数量不等于我在打算生成超过 9 个作业时指定的作业数量,以下是我的观察结果:
[fchen14@shelob001 ~]$ parallel --version
GNU parallel 20160922
Copyright (C) 2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Ole Tange and Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: http://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.
[fchen14@shelob001 ~]$ hostname # this shows my hostname
shelob001
当使用 GNUParallel 作为没有 -S $SERVERNAME 的本地主机时,没有问题,我打算生成 10 个作业,而 GNUParallel 启动了 10 个作业:
[fchen14@shelob001 ~]$ parallel --progress echo ::: `seq 1 10`
Computers / CPU cores / Max jobs to run
1:local / 16 / 10 # 10 jobs spawned, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:10/0/100%/0.0s 1
local:9/1/100%/0.0s 2
local:8/2/100%/0.0s 3
local:7/3/100%/0.0s 4
local:6/4/100%/0.0s 5
local:5/5/100%/0.0s 6
local:4/6/100%/0.0s 7
local:3/7/100%/0.0s 8
local:2/8/100%/0.0s 9
local:1/9/100%/0.0s 10
local:0/10/100%/0.0s
当我使用 GNUParallel 使用 -S $SERVERNAME
生成少于 10 个作业时,仍然没问题。
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 1`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 1 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:1/0/100%/0.0s 1
shelob001:0/1/100%/1.0s
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 8`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 8 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:8/0/100%/0.0s 1
shelob001:7/1/100%/1.0s 7
shelob001:6/2/100%/0.5s 3
shelob001:5/3/100%/0.3s 8
shelob001:4/4/100%/0.2s 5
shelob001:3/5/100%/0.2s 2
shelob001:2/6/100%/0.2s 6
shelob001:1/7/100%/0.1s 4
shelob001:0/8/100%/0.1s
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 9`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 # When the number of jobs is less than 10, no problem
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 1
shelob001:8/1/100%/1.0s 5
shelob001:7/2/100%/0.5s 8
shelob001:6/3/100%/0.3s 2
shelob001:5/4/100%/0.2s 6
shelob001:4/5/100%/0.2s 9
shelob001:3/6/100%/0.2s 3
shelob001:2/7/100%/0.1s 4
shelob001:1/8/100%/0.1s 7
shelob001:0/9/100%/0.1s
让我感到困惑的是,当我尝试使用大于等于 10 的作业数时,生成的作业数总是比想要的少一个,这里我想生成 10 个,只启动了 9 个作业:
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 10` # I want to start 10 jobs
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 9 #why here "Max jobs to run" is 9?
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:9/0/100%/0.0s 2
shelob001:9/1/100%/3.0s 1
shelob001:8/2/100%/1.5s 7
shelob001:7/3/100%/1.0s 4
shelob001:6/4/100%/0.8s 9
shelob001:5/5/100%/0.6s 8
shelob001:4/6/100%/0.5s 3
shelob001:3/7/100%/0.4s 5
shelob001:2/8/100%/0.4s 6
shelob001:1/9/100%/0.4s 10
shelob001:0/10/100%/0.4s
[fchen14@shelob001 ~]$ parallel -S shelob001 --progress echo ::: `seq 1 11`
Computers / CPU cores / Max jobs to run
1:shelob001 / 16 / 10 # it seems the jobs started is one less than I specified
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
shelob001:10/0/100%/0.0s 1
shelob001:10/1/100%/3.0s 2
shelob001:9/2/100%/1.5s 8
shelob001:8/3/100%/1.0s 3
shelob001:7/4/100%/0.8s 4
shelob001:6/5/100%/0.6s 5
shelob001:5/6/100%/0.5s 7
shelob001:4/7/100%/0.4s 10
shelob001:3/8/100%/0.4s 9
shelob001:2/9/100%/0.3s 6
shelob001:1/10/100%/0.4s 11
shelob001:0/11/100%/0.4s
[fchen14@shelob001 ~]$
我使用 "top" 检查了计算节点的状态,它确实显示当我使用 seq 1 10
时只使用了 9 个 Cpus。希望我已经把我的问题说清楚了,谁能指出这个问题的可能原因?欢迎任何建议。
非常感谢!
您似乎发现了一个错误。解决方法:-j+1