Redis Sentinel 用于监控目的?通知脚本触发次数过多

Redis Sentinel for monitoring purposes? Notification script fires off too many times

出于我的目的,我只想要一个 redis 运行 实例和一个 redis sentinel。我是 运行 Redis 3.0.6。我的 sentinel.conf 是所有默认值,除了 quorum 是 1 并且通知脚本行是未注释的:sentinel notificication-script mymaster /etc/redis/notify_me.sh。在 notify_me.sh 中,我执行了一个 python 脚本,出于测试目的,它只是说 print "HEY SOMETHING IS UP WITH REDIS"

我只想将redis sentinel 用于监控目的。稍后,我将在 python 脚本中编写一些内容,当 redis 出现故障时,这些内容将 email/text 我。然而,就像现在一样,它开火的频率太高了。当哨兵确定 redis 已经死亡时,我只想收到一次消息。 现在当我启动它时,该语句在开始时打印一次,然后在 failover-state-select-slave

之后再打印几次
23863:X 06 Jan 15:26:18.422 # Sentinel runid is db267af1b9257ced70eee9cbd076291db31f9335
23863:X 06 Jan 15:26:18.422 # +monitor master mymaster 127.0.0.1 6380 quorum 1
HEY SOMETHING IS UP WITH REDIS
23863:X 06 Jan 15:27:07.602 # +sdown master mymaster 127.0.0.1 6380
23863:X 06 Jan 15:27:07.602 # +odown master mymaster 127.0.0.1 6380 #quorum 1/1
23863:X 06 Jan 15:27:07.602 # +new-epoch 1
23863:X 06 Jan 15:27:07.602 # +try-failover master mymaster 127.0.0.1 6380
23863:X 06 Jan 15:27:07.604 # +vote-for-leader db267af1b9257ced70eee9cbd076291db31f9335 1
23863:X 06 Jan 15:27:07.604 # +elected-leader master mymaster 127.0.0.1 6380
23863:X 06 Jan 15:27:07.604 # +failover-state-select-slave master mymaster 127.0.0.1 6380
HEY SOMETHING IS UP WITH REDIS
HEY SOMETHING IS UP WITH REDIS
HEY SOMETHING IS UP WITH REDIS
HEY SOMETHING IS UP WITH REDIS
HEY SOMETHING IS UP WITH REDIS
HEY SOMETHING IS UP WITH REDIS
HEY SOMETHING IS UP WITH REDIS
23863:X 06 Jan 15:27:07.682 # -failover-abort-no-good-slave master mymaster 127.0.0.1 6380

我不希望它一开始就打印,我只希望它在服务器死机时打印一次,这样我以后就只得到一个email/text。任何人,对我能做什么有什么建议吗?谢谢!

不确定,但可能与sentinel.conf comments中提到的重试规则有关:

The scripts are executed with the following rules for error handling:

If script exits with "1" the execution is retried later (up to a maximum number of times currently set to 10).

If script exits with "2" (or an higher value) the script execution is not retried.

If script terminates because it receives a signal the behavior is the same as exit code 1.

A script has a maximum running time of 60 seconds. After this limit is reached the script is terminated with a SIGKILL and the execution retried.

好的,我已经在 freenode 上的#redis 的帮助下弄明白了。在我的 notify_me.sh 中,echo $* 会向您显示一些内容,例如:

+odown master mymaster 127.0.0.1 6379 #quorum 1/1

首先是一个 pubsub 消息,就像这里列出的消息:http://redis.io/topics/sentinel#pubsub-messages+odown 是当 sentinel 认为服务器客观上已关闭时,这就是我想做我的 python 事情的时候。每次有消息时 notify_me.sh 都会触发,这就是为什么我收到这么多 HEY SOMETHING IS UP WITH REDIS,所以我只写了这个:

notify_me.sh,

#!/bin/sh
python notify_redis.py $*

然后在notify_redis.py,

import sys

def main(args):
    for arg in args:
        if arg == "+odown":
            print "HEY SOMETHING IS UP WITH REDIS"
            email_text_or_whatever_thing_you_wanna_do()

main(sys.argv)

希望这对某人有所帮助!