当进程变成僵尸时Monit不清除pid文件并重新启动进程
Monit not clearing the pid file and restarting a process when the process becomes a zombie
我在一个 docker 容器中 运行 monit,它正在监视一堆进程,如 vault、nginx、mongodb 等等。我已经为每个具有启动停止功能的进程创建了包装器脚本,这些脚本被输入
#!/bin/sh
# vault service script
VAULT_DIR="/tmp/vault"
VAULT_USER="myuser"
USER=$(whoami)
if [ $USER != "root" ]
then
echo "Only root can run vault-server service"
exit 1
fi
usage() {
echo "Usage: `basename [=10=]`: <start|stop|status|restart>"
exit 1
}
start() {
status
if [ $PID -gt 0 ]
then
echo "vault server daemon was already started. PID: $PID"
return $PID
fi
echo "Starting vault server daemon..."
rm -f /var/run/vault.pid
VAULT_OPTIONS=""
VAULT_OPTIONS="-dev"
su $VAULT_USER -c "/usr/bin/nohup vault server $VAULT_OPTIONS 1>/var/log/vault/vault.log 2>/var/log/vault/vault.err &"
status
if [ $PID -gt 0 ]
then
echo $PID >> /var/run/vault.pid
fi
sleep 5
su $VAULT_USER /opt/vault/setup-vault.sh
}
stop() {
status
if [ $PID -eq 0 ]
then
echo "vault server daemon is already not running"
return 0
fi
echo "Stopping vault server daemon..."
rm -f /var/run/vault.pid
kill $PID
}
status() {
PID=`ps -ef | grep "vault server" | grep -v grep | grep -v "\[" | awk '{print }'`
if [ "x$PID" = "x" ]
then
PID=0
fi
# if PID is greater than 0 then vault server is running, else it is not
return $PID
}
if [ "x" = "xstart" ]
then
start
exit 0
fi
if [ "x" = "xstop" ]
then
stop
exit 0
fi
if [ "x" = "xrestart" ]
then
stop
start
exit 0
fi
if [ "x" = "xstatus" ]
then
status
if [ $PID -gt 0 ]
then
echo "vault server daemon is running with PID: $PID"
else
echo "vault server daemon is NOT running"
fi
exit $PID
fi
usage
由于某种原因,当进程崩溃并变成僵尸时,monit 不会清除 pid 文件并重新启动进程。此外,为了在我的状态函数中验证而不是捕获僵尸进程,我在 ps -ef
语句中添加了 grep -v "\["
子句。还有什么我需要做的吗?或者是否有人以前遇到过这个问题?
如果您的应用程序正在生成僵尸,则将 tini 添加到您的堆栈中。您的 entrypoint/cmd 变成 tini,它调用您现有的入口点,tini 将处理僵尸收割。
这是僵尸进程没有通过主机的 init 进程收割的命名空间容器监狱的结果。所以你需要一个命名空间的 pid 1 来收割你的僵尸。
我在一个 docker 容器中 运行 monit,它正在监视一堆进程,如 vault、nginx、mongodb 等等。我已经为每个具有启动停止功能的进程创建了包装器脚本,这些脚本被输入
#!/bin/sh
# vault service script
VAULT_DIR="/tmp/vault"
VAULT_USER="myuser"
USER=$(whoami)
if [ $USER != "root" ]
then
echo "Only root can run vault-server service"
exit 1
fi
usage() {
echo "Usage: `basename [=10=]`: <start|stop|status|restart>"
exit 1
}
start() {
status
if [ $PID -gt 0 ]
then
echo "vault server daemon was already started. PID: $PID"
return $PID
fi
echo "Starting vault server daemon..."
rm -f /var/run/vault.pid
VAULT_OPTIONS=""
VAULT_OPTIONS="-dev"
su $VAULT_USER -c "/usr/bin/nohup vault server $VAULT_OPTIONS 1>/var/log/vault/vault.log 2>/var/log/vault/vault.err &"
status
if [ $PID -gt 0 ]
then
echo $PID >> /var/run/vault.pid
fi
sleep 5
su $VAULT_USER /opt/vault/setup-vault.sh
}
stop() {
status
if [ $PID -eq 0 ]
then
echo "vault server daemon is already not running"
return 0
fi
echo "Stopping vault server daemon..."
rm -f /var/run/vault.pid
kill $PID
}
status() {
PID=`ps -ef | grep "vault server" | grep -v grep | grep -v "\[" | awk '{print }'`
if [ "x$PID" = "x" ]
then
PID=0
fi
# if PID is greater than 0 then vault server is running, else it is not
return $PID
}
if [ "x" = "xstart" ]
then
start
exit 0
fi
if [ "x" = "xstop" ]
then
stop
exit 0
fi
if [ "x" = "xrestart" ]
then
stop
start
exit 0
fi
if [ "x" = "xstatus" ]
then
status
if [ $PID -gt 0 ]
then
echo "vault server daemon is running with PID: $PID"
else
echo "vault server daemon is NOT running"
fi
exit $PID
fi
usage
由于某种原因,当进程崩溃并变成僵尸时,monit 不会清除 pid 文件并重新启动进程。此外,为了在我的状态函数中验证而不是捕获僵尸进程,我在 ps -ef
语句中添加了 grep -v "\["
子句。还有什么我需要做的吗?或者是否有人以前遇到过这个问题?
如果您的应用程序正在生成僵尸,则将 tini 添加到您的堆栈中。您的 entrypoint/cmd 变成 tini,它调用您现有的入口点,tini 将处理僵尸收割。
这是僵尸进程没有通过主机的 init 进程收割的命名空间容器监狱的结果。所以你需要一个命名空间的 pid 1 来收割你的僵尸。