故障转移后 Redis 哨兵节点无法同步
Redis sentinel node can not sync after failover
我们已经使用 3 个节点设置了具有哨兵高可用性的 Redis。假设第一个节点是主节点,当我们重启第一个节点时,发生故障转移,第二个节点成为主节点,直到此时一切正常。但是当第一个节点恢复时,它无法与主节点同步,我们看到在它的配置中没有设置“masterauth”。
这是错误日志,由 CONFIG REWRITE 配置生成:
1182:S 29 May 2021 13:49:42.713 * Reconnecting to MASTER 192.168.1.2:6379 after failure
1182:S 29 May 2021 13:49:42.716 * MASTER <-> REPLICA sync started
1182:S 29 May 2021 13:49:42.716 * Non blocking connect for SYNC fired the event.
1182:S 29 May 2021 13:49:42.717 * Master replied to PING, replication can continue...
1182:S 29 May 2021 13:49:42.717 * (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.717 * (Non critical) Master does not understand REPLCONF capa: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.717 * Partial resynchronization not possible (no cached master)
1182:S 29 May 2021 13:49:42.718 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.718 * Retrying with SYNC...
# Generated by CONFIG REWRITE
save 3600 1
save 300 100
save 60 10000
user default on #eb5fbb922a75775721db681c49600c069cf686765eeebaa6e18fad195812140d ~* &* +@all
replicaof 192.168.1.2 6379
有什么问题?
配置示例:
bind 127.0.0.1 -::1 192.168.1.3
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised systemd
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile ""
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
rdb-del-sync-files no
dir "/"
replicaof 192.168.1.2 6379
masterauth "redis"
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
requirepass "redis"
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4kb
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes
对于可能运行遇到同样问题的人,问题是REDIS配置错误,第三次部署后我们仔细设置参数,没有发现问题。
我们已经使用 3 个节点设置了具有哨兵高可用性的 Redis。假设第一个节点是主节点,当我们重启第一个节点时,发生故障转移,第二个节点成为主节点,直到此时一切正常。但是当第一个节点恢复时,它无法与主节点同步,我们看到在它的配置中没有设置“masterauth”。
这是错误日志,由 CONFIG REWRITE 配置生成:
1182:S 29 May 2021 13:49:42.713 * Reconnecting to MASTER 192.168.1.2:6379 after failure
1182:S 29 May 2021 13:49:42.716 * MASTER <-> REPLICA sync started
1182:S 29 May 2021 13:49:42.716 * Non blocking connect for SYNC fired the event.
1182:S 29 May 2021 13:49:42.717 * Master replied to PING, replication can continue...
1182:S 29 May 2021 13:49:42.717 * (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.717 * (Non critical) Master does not understand REPLCONF capa: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.717 * Partial resynchronization not possible (no cached master)
1182:S 29 May 2021 13:49:42.718 # Unexpected reply to PSYNC from master: -NOAUTH Authentication required.
1182:S 29 May 2021 13:49:42.718 * Retrying with SYNC...
# Generated by CONFIG REWRITE
save 3600 1
save 300 100
save 60 10000
user default on #eb5fbb922a75775721db681c49600c069cf686765eeebaa6e18fad195812140d ~* &* +@all
replicaof 192.168.1.2 6379
有什么问题?
配置示例:
bind 127.0.0.1 -::1 192.168.1.3
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised systemd
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile ""
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
rdb-del-sync-files no
dir "/"
replicaof 192.168.1.2 6379
masterauth "redis"
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
requirepass "redis"
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4kb
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes
对于可能运行遇到同样问题的人,问题是REDIS配置错误,第三次部署后我们仔细设置参数,没有发现问题。