Grep qstat 输出并在完成后复制文件
Grep qstat output and copy files once done
我在集群上使用 PBS 作业调度程序。在 bash 中,我想监控作业状态,一旦作业完成,我想将结果复制到
特定位置(/data/myfolder/)
我的 qstat 输出如下所示:
JobID Username Queue Jobname SessID NDS TSK Memory Time Status
----------------------------------------------------------------
717.XXXXXX user XXXX SS 2323283 1 24 122gb -- E
提前致谢
您可以只使用 grep 查找 " C "
,但您也可以只使用 -o [hostname:]path
流式传输到最终目的地,只要您从节点设置了 ssh 密钥您的 POSIX 帐户。
如果您最终执行了 grep,您应该做一个好公民并将检查频率限制在一分钟一次或两次,以免造成服务器垃圾邮件,这会影响性能。
有一个脚本 here 可以执行此操作(针对 SGE)。我开始只为您摘录相关部分,但您可能更容易从完整脚本开始,只需将 qsub
命令插入 submit_job
函数,然后将代码放入想要在脚本中的 wait_job_finish
命令之后复制结果。如果你愿意,你可以去掉最后的日志打印。
#!/bin/bash
# this script will submit a qsub job and check on host information for the cluster
# node which it ends up running on
# ~~~~~ CUSTOM FUNCTIONS ~~~~~ #
submit_job () {
local job_name=""
qsub -j y -N "$job_name" -o :${PWD}/ -e :${PWD}/ <<E0F
set -x
hostname
cat /etc/hosts
python -c "import socket; print socket.gethostbyname(socket.gethostname())"
# sleep 5000
E0F
}
wait_job_start () {
local job_id=""
printf "waiting for job to start"
while ! qstat | grep "$job_id" | grep -Eq '[[:space:]]r[[:space:]]'
do
printf "."
sleep 1
done
printf "\n\n"
local node_name="$(get_node_name "$job_id")"
printf "Job is running on node $node_name \n\n"
}
wait_job_finish () {
local job_id=""
printf "waiting for job to finish"
while qstat | grep -q "$job_id"
do
printf "."
sleep 1
done
printf "\n\n"
}
check_for_job_submission () {
local job_id=""
if ! qstat | grep -q "$job_id" ; then
echo "its there"
else
echo "not there"
fi
}
get_node_name () {
local job_id=""
qstat | grep "$job_id" | sed -e 's|^.*[[:space:]]\([a-zA-Z0-9.]*@[^ ]*\).*$||g'
}
# ~~~~~ RUN ~~~~~ #
printf "Submitting cluster job to get node hostname and IP\n\n"
job_name="get_node_hostnames"
job_id="$(submit_job "$job_name")" # Your job 832606 ("get_node_hostnames") has been submitted
job_id="$(echo "$job_id" | sed -e 's|.*[[:space:]]\([[:digit:]]*\)[[:space:]].*||g' )"
job_stdout_log="${job_name}.o${job_id}"
printf "Job ID:\t%s\nJob Name:\t%s\n\n" "$job_id" "$job_name"
wait_job_start "$job_id"
wait_job_finish "$job_id"
printf "\n\nReading log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && cat "$job_stdout_log"
printf "\n\nRemoving log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && rm -f "$job_stdout_log"
旁注:如果您喜欢 Python,还有一个更强大的等价物 here
您可能需要对两者做一些小的调整以适应您的 PBS 系统,因为这是为 SGE 编写的。
我在集群上使用 PBS 作业调度程序。在 bash 中,我想监控作业状态,一旦作业完成,我想将结果复制到 特定位置(/data/myfolder/)
我的 qstat 输出如下所示:
JobID Username Queue Jobname SessID NDS TSK Memory Time Status
----------------------------------------------------------------
717.XXXXXX user XXXX SS 2323283 1 24 122gb -- E
提前致谢
您可以只使用 grep 查找 " C "
,但您也可以只使用 -o [hostname:]path
流式传输到最终目的地,只要您从节点设置了 ssh 密钥您的 POSIX 帐户。
如果您最终执行了 grep,您应该做一个好公民并将检查频率限制在一分钟一次或两次,以免造成服务器垃圾邮件,这会影响性能。
有一个脚本 here 可以执行此操作(针对 SGE)。我开始只为您摘录相关部分,但您可能更容易从完整脚本开始,只需将 qsub
命令插入 submit_job
函数,然后将代码放入想要在脚本中的 wait_job_finish
命令之后复制结果。如果你愿意,你可以去掉最后的日志打印。
#!/bin/bash
# this script will submit a qsub job and check on host information for the cluster
# node which it ends up running on
# ~~~~~ CUSTOM FUNCTIONS ~~~~~ #
submit_job () {
local job_name=""
qsub -j y -N "$job_name" -o :${PWD}/ -e :${PWD}/ <<E0F
set -x
hostname
cat /etc/hosts
python -c "import socket; print socket.gethostbyname(socket.gethostname())"
# sleep 5000
E0F
}
wait_job_start () {
local job_id=""
printf "waiting for job to start"
while ! qstat | grep "$job_id" | grep -Eq '[[:space:]]r[[:space:]]'
do
printf "."
sleep 1
done
printf "\n\n"
local node_name="$(get_node_name "$job_id")"
printf "Job is running on node $node_name \n\n"
}
wait_job_finish () {
local job_id=""
printf "waiting for job to finish"
while qstat | grep -q "$job_id"
do
printf "."
sleep 1
done
printf "\n\n"
}
check_for_job_submission () {
local job_id=""
if ! qstat | grep -q "$job_id" ; then
echo "its there"
else
echo "not there"
fi
}
get_node_name () {
local job_id=""
qstat | grep "$job_id" | sed -e 's|^.*[[:space:]]\([a-zA-Z0-9.]*@[^ ]*\).*$||g'
}
# ~~~~~ RUN ~~~~~ #
printf "Submitting cluster job to get node hostname and IP\n\n"
job_name="get_node_hostnames"
job_id="$(submit_job "$job_name")" # Your job 832606 ("get_node_hostnames") has been submitted
job_id="$(echo "$job_id" | sed -e 's|.*[[:space:]]\([[:digit:]]*\)[[:space:]].*||g' )"
job_stdout_log="${job_name}.o${job_id}"
printf "Job ID:\t%s\nJob Name:\t%s\n\n" "$job_id" "$job_name"
wait_job_start "$job_id"
wait_job_finish "$job_id"
printf "\n\nReading log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && cat "$job_stdout_log"
printf "\n\nRemoving log file ${job_stdout_log}\n\n"
[ -f "$job_stdout_log" ] && rm -f "$job_stdout_log"
旁注:如果您喜欢 Python,还有一个更强大的等价物 here
您可能需要对两者做一些小的调整以适应您的 PBS 系统,因为这是为 SGE 编写的。