自动处理 monitoring/management 和 Python
Automatic process monitoring/management with Python
是的,所以我有一个 python 进程,它一直 运行,甚至可能在 Supervisor 上。实现以下监控的最佳方式是什么?
- 如果进程崩溃,发送警报并重新启动。我想在每次进程崩溃时自动接收信号并自动重新启动它。
- 如果进程过时发送警报并重新启动,即 1 分钟内没有处理任何东西。
- 按需重启
我希望通过 Python 实现以上所有目标。我知道 Supervisord 会完成大部分工作,但我想看看是否可以通过 Python 本身来完成。
我想你要找的是 Supervisor Events。 http://supervisord.org/events.html
另请查看 Superlance,它是一个插件实用程序包,用于监视和控制 运行 在监督下的进程。
[https://superlance.readthedocs.org/en/latest/]
您可以配置崩溃电子邮件、崩溃短信、内存消耗警报、HTTP 挂钩等内容。
嗯,如果你想要一个本土解决方案,这就是我能想到的。
在 redis 中维护实际和预期的进程状态。您可以通过创建一个 Web 界面来检查实际状态并更改预期状态,以您想要的方式对其进行监控。
运行 crontab 中的 python 脚本用于检查状态并在需要时采取适当的操作。在这里,我每 3 秒检查一次,并使用 SES 通过电子邮件提醒管理员。
免责声明:代码尚未 运行 或测试。现在才写的,所以容易出错
打开 crontab 文件:
$crontab -e
在它的末尾添加这一行,使 run_process.sh 运行 每分钟。
#Runs this process every 1 minute.
*/1 * * * * bash ~/path/to/run_monitor.sh
run_moniter.sh 运行 是 python 脚本。它 运行 每 3 秒进入一个 for 循环。
这是因为 crontab 给出的最小时间间隔为 1 分钟。我们想每 3 秒检查一次进程,共 20 次(3 秒 * 20 = 1 分钟)。所以它会 运行 一分钟,然后 crontab 运行 再次启动它。
run_monitor.sh
for count in {0..20}
do
cd '/path/to/check_status'
/usr/local/bin/python check_status.py "myprocessname" "python startcommand.py"
sleep 3 #check every 3 seconds.
done
这里我假设:
*状态 0 = 停止或停止(预期与实际)
*状态-1 = 重启
*状态 1 = 运行 或 运行ning
您可以根据自己的方便添加更多状态,陈旧的过程也可以是一种状态。
我已经使用进程名来杀死或启动或检查进程,您可以轻松修改它以读取特定的 PID 文件。
check_status.py
import sys
import redis
import subprocess
import sys
import boto.ses
def send_mail(recipients, message_subject, message_body):
"""
uses AWS SES to send mail.
"""
SENDER_MAIL = 'xxx@yyy.com'
AWS_KEY = 'xxxxxxxxxxxxxxxxxxx'
AWS_SECRET = 'xxxxxxxxxxxxxxxxxxx'
AWS_REGION = 'xx-xxxx-x'
mail_conn = boto.ses.connect_to_region(AWS_REGION,
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_SECRET
)
mail_conn.send_email(SENDER_MAIL, message_subject, message_body, recipient, format='html')
return True
class Shell(object):
'''
Convinient Wrapper over Subprocess.
'''
def __init__(self, command, raise_on_error=True):
self.command = command
self.output = None
self.error = None
self.return_code
def run(self):
try:
process = subprocess.Popen(self.command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
self.return_code = process.wait()
self.output, self.error = process.communicate()
if self.return_code and self.raise_on_error:
print self.error
raise Exception("Error while executing %s::%s"%(self.command, self.error))
except subprocess.CalledProcessError:
print self.error
raise Exception("Error while executing %s::%s"%(self.command, self.error))
redis_client = redis.Redis('xxxredis_hostxxx')
def get_state(process_name, state_type): #state_type will be expected or actual.
state = redis.get('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type)) #value could be 0 or 1
return state
def set_state(process_name, state_type, state): #state_type will be expected or actual.
state = redis.set('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type), state)
return state
def get_stale_state(process_name):
state = redis.get('{process_name}_stale_state'.format(process_name=process_name)) #value could be 0 or 1
return state
def check_running_status(process_name):
command = "ps -ef|grep {process_name}|wc -l".format(process_name=process_name)
shell = Shell(command = command)
shell.run()
if shell.output=='0':
return False
return True
def start_process(start_command): #pass start_command with a '&' so the process starts in the background.
shell = Shell(command = command)
shell.run()
def stop_process(process_name):
command = "ps -ef| grep {process_name}| awk '{print }'".format(process_name=process_name)
shell = Shell(command = command, raise_on_error=False)
shell.run()
if not shell.output:
return
process_ids = shell.output.strip().split()
for process_id in process_ids:
command = 'kill {process_id}'.format(process_id=process_id)
shell = Shell(command=command, raise_on_error=False)
shel.run()
def check_process(process_name, start_command):
expected_state = get_state(process_name, 'expected')
if expected_state == 0: #stop
stop_process(process_name)
set_state(process_name, 'actual', 0)
else if expected_state == -1: #restart
stop_process(process_name)
set_state(process_name, 'actual', 0)
start_process(start_command)
set_state(process_name, 'actual', 1)
set_state(process_name, 'expected', 1) #set expected back to 1 so we dont keep on restarting.
elif expected_state == 1:
running = check_running_status(process_name)
if not running:
set_state(process_name, 'actual', 0)
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is Down. Trying to restart")
start_process(start_command)
running = check_running_status(process_name)
if running:
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is was restarted.")
set_state(process_name, 'actual', 1)
else:
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is could not be restarted.")
if __name__ == '__main__':
args = sys.argv[1:]
process_name = args[0]
start_command = args[1]
check_process(process_name, start_command)
是的,所以我有一个 python 进程,它一直 运行,甚至可能在 Supervisor 上。实现以下监控的最佳方式是什么?
- 如果进程崩溃,发送警报并重新启动。我想在每次进程崩溃时自动接收信号并自动重新启动它。
- 如果进程过时发送警报并重新启动,即 1 分钟内没有处理任何东西。
- 按需重启
我希望通过 Python 实现以上所有目标。我知道 Supervisord 会完成大部分工作,但我想看看是否可以通过 Python 本身来完成。
我想你要找的是 Supervisor Events。 http://supervisord.org/events.html
另请查看 Superlance,它是一个插件实用程序包,用于监视和控制 运行 在监督下的进程。 [https://superlance.readthedocs.org/en/latest/]
您可以配置崩溃电子邮件、崩溃短信、内存消耗警报、HTTP 挂钩等内容。
嗯,如果你想要一个本土解决方案,这就是我能想到的。
在 redis 中维护实际和预期的进程状态。您可以通过创建一个 Web 界面来检查实际状态并更改预期状态,以您想要的方式对其进行监控。
运行 crontab 中的 python 脚本用于检查状态并在需要时采取适当的操作。在这里,我每 3 秒检查一次,并使用 SES 通过电子邮件提醒管理员。
免责声明:代码尚未 运行 或测试。现在才写的,所以容易出错
打开 crontab 文件:
$crontab -e
在它的末尾添加这一行,使 run_process.sh 运行 每分钟。
#Runs this process every 1 minute.
*/1 * * * * bash ~/path/to/run_monitor.sh
run_moniter.sh 运行 是 python 脚本。它 运行 每 3 秒进入一个 for 循环。
这是因为 crontab 给出的最小时间间隔为 1 分钟。我们想每 3 秒检查一次进程,共 20 次(3 秒 * 20 = 1 分钟)。所以它会 运行 一分钟,然后 crontab 运行 再次启动它。
run_monitor.sh
for count in {0..20}
do
cd '/path/to/check_status'
/usr/local/bin/python check_status.py "myprocessname" "python startcommand.py"
sleep 3 #check every 3 seconds.
done
这里我假设:
*状态 0 = 停止或停止(预期与实际)
*状态-1 = 重启
*状态 1 = 运行 或 运行ning
您可以根据自己的方便添加更多状态,陈旧的过程也可以是一种状态。
我已经使用进程名来杀死或启动或检查进程,您可以轻松修改它以读取特定的 PID 文件。
check_status.py
import sys
import redis
import subprocess
import sys
import boto.ses
def send_mail(recipients, message_subject, message_body):
"""
uses AWS SES to send mail.
"""
SENDER_MAIL = 'xxx@yyy.com'
AWS_KEY = 'xxxxxxxxxxxxxxxxxxx'
AWS_SECRET = 'xxxxxxxxxxxxxxxxxxx'
AWS_REGION = 'xx-xxxx-x'
mail_conn = boto.ses.connect_to_region(AWS_REGION,
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_SECRET
)
mail_conn.send_email(SENDER_MAIL, message_subject, message_body, recipient, format='html')
return True
class Shell(object):
'''
Convinient Wrapper over Subprocess.
'''
def __init__(self, command, raise_on_error=True):
self.command = command
self.output = None
self.error = None
self.return_code
def run(self):
try:
process = subprocess.Popen(self.command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
self.return_code = process.wait()
self.output, self.error = process.communicate()
if self.return_code and self.raise_on_error:
print self.error
raise Exception("Error while executing %s::%s"%(self.command, self.error))
except subprocess.CalledProcessError:
print self.error
raise Exception("Error while executing %s::%s"%(self.command, self.error))
redis_client = redis.Redis('xxxredis_hostxxx')
def get_state(process_name, state_type): #state_type will be expected or actual.
state = redis.get('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type)) #value could be 0 or 1
return state
def set_state(process_name, state_type, state): #state_type will be expected or actual.
state = redis.set('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type), state)
return state
def get_stale_state(process_name):
state = redis.get('{process_name}_stale_state'.format(process_name=process_name)) #value could be 0 or 1
return state
def check_running_status(process_name):
command = "ps -ef|grep {process_name}|wc -l".format(process_name=process_name)
shell = Shell(command = command)
shell.run()
if shell.output=='0':
return False
return True
def start_process(start_command): #pass start_command with a '&' so the process starts in the background.
shell = Shell(command = command)
shell.run()
def stop_process(process_name):
command = "ps -ef| grep {process_name}| awk '{print }'".format(process_name=process_name)
shell = Shell(command = command, raise_on_error=False)
shell.run()
if not shell.output:
return
process_ids = shell.output.strip().split()
for process_id in process_ids:
command = 'kill {process_id}'.format(process_id=process_id)
shell = Shell(command=command, raise_on_error=False)
shel.run()
def check_process(process_name, start_command):
expected_state = get_state(process_name, 'expected')
if expected_state == 0: #stop
stop_process(process_name)
set_state(process_name, 'actual', 0)
else if expected_state == -1: #restart
stop_process(process_name)
set_state(process_name, 'actual', 0)
start_process(start_command)
set_state(process_name, 'actual', 1)
set_state(process_name, 'expected', 1) #set expected back to 1 so we dont keep on restarting.
elif expected_state == 1:
running = check_running_status(process_name)
if not running:
set_state(process_name, 'actual', 0)
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is Down. Trying to restart")
start_process(start_command)
running = check_running_status(process_name)
if running:
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is was restarted.")
set_state(process_name, 'actual', 1)
else:
send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is could not be restarted.")
if __name__ == '__main__':
args = sys.argv[1:]
process_name = args[0]
start_command = args[1]
check_process(process_name, start_command)