PBS 作业保持排队状态('Q' 状态)但 运行 与 q运行
PBS jobs stay queued ('Q' state) but run with qrun
在我完整的本地扭矩安装 (torque-6.1.1) 中,我所有提交的作业都停留在 'Q' 状态,我必须使用 qrun 强制执行它们。
>qstat -f 141
Job Id: 141.localhost
Job_Name = script.pbs
Job_Owner = michael@localhost
job_state = Q
queue = batch
server = localhost
Checkpoint = u
ctime = Wed Aug 23 16:45:25 2017
Error_Path = localhost:/var/spool/torque/script.pbs.e141
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = bae
mtime = Wed Aug 23 16:45:25 2017
Output_Path = localhost:/var/spool/torque/script.pbs.o141
Priority = 0
qtime = Wed Aug 23 16:45:25 2017
Rerunable = True
Resource_List.walltime = 01:00:00
Resource_List.nodes = 1
Resource_List.nodect = 1
Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/michael,
PBS_O_LOGNAME=michael,
PBS_O_PATH=/home/michael/bin:/home/michael/.local/bin:/usr/local/bin:
/usr/local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbi
n:/bin:/usr/games:/usr/local/games:/snap/bin,PBS_O_SHELL=/bin/bash,
PBS_O_LANG=fr_FR.UTF-8,PBS_O_WORKDIR=/var/spool/torque,
PBS_O_HOST=localhost,PBS_O_SERVER=localhost
euser = michael
egroup = michael
queue_type = E
etime = Wed Aug 23 16:45:25 2017
submit_args = /home/michael/cnes-sowt/script.pbs
fault_tolerant = False
job_radix = 0
submit_host = localhost
init_work_dir = /var/spool/torque
request_version = 1
>sudo tracejob 141
/var/spool/torque/mom_logs/20170823: No matching job records located
/var/spool/torque/sched_logs/20170823: No matching job records located
Job: 141.localhost
08/23/2017 16:45:25.323 S enqueuing into batch, state 1 hop 1
08/23/2017 16:45:25 A queue=batch
这可能是因为我可以 qsub 而不是 root,但我必须使用 sudo 来 qrun 吗?
非常感谢您的帮助..
解决方案是 https://cmayes.wordpress.com/2012/12/15/single-host-torque-pbs/ ,通过在 /etc/hosts
中添加一条规则
在我完整的本地扭矩安装 (torque-6.1.1) 中,我所有提交的作业都停留在 'Q' 状态,我必须使用 qrun 强制执行它们。
>qstat -f 141
Job Id: 141.localhost
Job_Name = script.pbs
Job_Owner = michael@localhost
job_state = Q
queue = batch
server = localhost
Checkpoint = u
ctime = Wed Aug 23 16:45:25 2017
Error_Path = localhost:/var/spool/torque/script.pbs.e141
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = bae
mtime = Wed Aug 23 16:45:25 2017
Output_Path = localhost:/var/spool/torque/script.pbs.o141
Priority = 0
qtime = Wed Aug 23 16:45:25 2017
Rerunable = True
Resource_List.walltime = 01:00:00
Resource_List.nodes = 1
Resource_List.nodect = 1
Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/michael,
PBS_O_LOGNAME=michael,
PBS_O_PATH=/home/michael/bin:/home/michael/.local/bin:/usr/local/bin:
/usr/local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbi
n:/bin:/usr/games:/usr/local/games:/snap/bin,PBS_O_SHELL=/bin/bash,
PBS_O_LANG=fr_FR.UTF-8,PBS_O_WORKDIR=/var/spool/torque,
PBS_O_HOST=localhost,PBS_O_SERVER=localhost
euser = michael
egroup = michael
queue_type = E
etime = Wed Aug 23 16:45:25 2017
submit_args = /home/michael/cnes-sowt/script.pbs
fault_tolerant = False
job_radix = 0
submit_host = localhost
init_work_dir = /var/spool/torque
request_version = 1
>sudo tracejob 141
/var/spool/torque/mom_logs/20170823: No matching job records located
/var/spool/torque/sched_logs/20170823: No matching job records located
Job: 141.localhost
08/23/2017 16:45:25.323 S enqueuing into batch, state 1 hop 1
08/23/2017 16:45:25 A queue=batch
这可能是因为我可以 qsub 而不是 root,但我必须使用 sudo 来 qrun 吗?
非常感谢您的帮助..
解决方案是 https://cmayes.wordpress.com/2012/12/15/single-host-torque-pbs/ ,通过在 /etc/hosts
中添加一条规则