当进程变成僵尸时Monit不清除pid文件并重新启动进程

Monit not clearing the pid file and restarting a process when the process becomes a zombie

我在一个 docker 容器中 运行 monit,它正在监视一堆进程,如 vault、nginx、mongodb 等等。我已经为每个具有启动停止功能的进程创建了包装器脚本,这些脚本被输入

#!/bin/sh
# vault service script

VAULT_DIR="/tmp/vault"
VAULT_USER="myuser"
USER=$(whoami)
if [ $USER != "root" ]
then
     echo "Only root can run vault-server service"
     exit 1
fi


usage() {
     echo "Usage: `basename [=10=]`: <start|stop|status|restart>"
     exit 1 
}

start() {
     status
     if [ $PID -gt 0 ]
     then
        echo "vault server daemon was already started. PID: $PID"
        return $PID
     fi
     echo "Starting vault server daemon..."
     rm -f /var/run/vault.pid
     VAULT_OPTIONS=""
     VAULT_OPTIONS="-dev"
     su $VAULT_USER -c "/usr/bin/nohup vault server $VAULT_OPTIONS 1>/var/log/vault/vault.log 2>/var/log/vault/vault.err &"
     status
     if [ $PID -gt 0 ]
     then
        echo $PID >> /var/run/vault.pid
     fi
     sleep 5
     su $VAULT_USER /opt/vault/setup-vault.sh
}

stop() {

     status
     if [ $PID -eq 0 ]
     then
        echo "vault server daemon is already not running"
        return 0
     fi
     echo "Stopping vault server daemon..."
     rm -f /var/run/vault.pid
     kill $PID
 }
status() {                                                               
     PID=`ps -ef | grep "vault server" | grep -v grep | grep -v "\[" | awk '{print }'`                                                  
     if [ "x$PID" = "x" ]                                     
     then                                                                                                                  
        PID=0                                                       
     fi                                                                                                                    

     # if PID is greater than 0 then vault server is running, else it is not                                               
     return $PID                                                         
}                                                                              

if [ "x" = "xstart" ]                                                        
then                                                                                                                          
  start                                                                  
  exit 0                                                                 
fi                                                                                                                            

if [ "x" = "xstop" ]                                                                                                        
then                                                                                                                          
  stop                                                                   
  exit 0                                                                  
fi                                                                             

if [ "x" = "xrestart" ]                                                      
then                                                                           
  stop                                                     
  start                                                                  
  exit 0                               
fi                                                                             

if [ "x" = "xstatus" ]                                                       
then                                          
   status                                                                 
   if [ $PID -gt 0 ]                                        
   then                                                                   
      echo "vault server daemon is running with PID: $PID"
   else                                                                   
      echo "vault server daemon is NOT running"                   
   fi                                                                     
   exit $PID                                                           
fi                                                                             

usage  

由于某种原因,当进程崩溃并变成僵尸时,monit 不会清除 pid 文件并重新启动进程。此外,为了在我的状态函数中验证而不是捕获僵尸进程,我在 ps -ef 语句中添加了 grep -v "\[" 子句。还有什么我需要做的吗?或者是否有人以前遇到过这个问题?

如果您的应用程序正在生成僵尸,则将 tini 添加到您的堆栈中。您的 entrypoint/cmd 变成 tini,它调用您现有的入口点,tini 将处理僵尸收割。

这是僵尸进程没有通过主机的 init 进程收割的命名空间容器监狱的结果。所以你需要一个命名空间的 pid 1 来收割你的僵尸。