redis-cluster

Question

我正在尝试将一个地址上的 8 个副本集群连接到另一个地址上的现有集群。

副本服务器都运行集群模式。

当我尝试执行任一操作时：

./redis-trib.rb add-node --slave REPLICA_IP:6380 MASTER_IP:6380

或

./redis-cli --cluster add-node REPLICA_IP:6380 MASTER_IP:6380 --cluster-slave

我得到了同样的结果；

Waiting for the cluster to join...........................

无限期挂起。

两台服务器绝对可以看到对方，我可以从任何一台服务器连接到任何相关的 redis 节点（副本或主节点）。 discovery/communion 端口（16830 等）都已打开且可联系。这些命令的输出还表明已找到集群，因为它显示了每个节点及其正确的节点 ID。

这是任一添加节点命令的完整输出：

>>> Adding node REPLICA_IP:6380 to cluster MASTER_IP:6380
>>> Performing Cluster Check (using node MASTER_IP:6380)
M: 043a5fa4fdca929d3d87f953906dc7c1f030926c MASTER_IP:6380
   slots:[0-2047] (2048 slots) master
M: e104777d31630eef11a01e41c7d3a6c98e14ab64 MASTER_IP:6386
   slots:[12288-14335] (2048 slots) master
M: 9c807d6f57a9634adcdf75fa1943c32c985bda1c MASTER_IP:6384
   slots:[8192-10239] (2048 slots) master
M: 0f7ec07deff97ca23fe67109da2365d916ff1a67 MASTER_IP:6383
   slots:[6144-8191] (2048 slots) master
M: 974e8b4051b7a8e33db62ba7ad62c7e54abe699d MASTER_IP:6382
   slots:[4096-6143] (2048 slots) master
M: b647bb9d732ff2ee83b097ffb8b49fb2bccd366f MASTER_IP:6387
   slots:[14336-16383] (2048 slots) master
M: a86ac1d5e783bed133b153e471fdd970c17c6af5 MASTER_IP:6381
   slots:[2048-4095] (2048 slots) master
M: 6f859b03f86eded0188ba493063c5c2114d7c11f MASTER_IP:6385
   slots:[10240-12287] (2048 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Automatically selected master MASTER_IP:6380
>>> Send CLUSTER MEET to node REPLICA_IP:6380 to make it join the cluster.
Waiting for the cluster to join
............................

如果我手动运行 CLUSTER MEET 然后 CLUSTER NODES 我可以暂时看到另一个处于状态 'handshake' 和状态 'disconnected' 的节点然后它消失。它显示的节点 ID 与实际不同。

Answer 1

我想通了：

我使用 tcpdump 确认两台服务器在 redis 服务器端口和握手端口上反复相互通信，而 add-slave 命令永远挂起。

但是在我拥有的每个节点的 redis 配置中：

bind 0.0.0.0

但在主服务器和副本服务器上，配置必须为：

bind SERVER_IP

为了让 CLUSTER MEET 正常工作。

Answer 2

在我的例子中，每个节点都有相同的 MMID，所以它一直在等待。

What I was doing, I configured an EC2 ami, and launched 3 servers from AMI and using user-data I did reconfigured the redis cluster via shell script and restarted the server, each server got same ID as from which server I had created AMI.

M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.134.109:6379
   slots:[0-5460] (5461 slots) master
M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.175.235:6379
   slots:[5461-10922] (5462 slots) master
M: b29aff425cdfa94272cdce1816939a9692c71e12 10.0.155.10:6379
   slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes

所以我在每个节点上都做了 CLUSTER RESET HARD；有效

https://redis.io/commands/cluster-reset

Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
M: 36a129fab85d2aed310bfd7cc141035de420fa92 10.0.134.109:6379
   slots:[0-5460] (5461 slots) master
M: 773bc76e903da27efbd965bca26366fa20878397 10.0.175.235:6379
   slots:[5461-10922] (5462 slots) master
M: 10a79173d1f7a9c568bdfa3b955b6e133d2dceaa 10.0.155.10:6379
   slots:[10923-16383] (5461 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
..
>>> Performing Cluster Check (using node 10.0.134.109:6379)
M: 36a129fab85d2aed310bfd7cc141035de420fa92 10.0.134.109:6379
   slots:[0-5460] (5461 slots) master
M: 773bc76e903da27efbd965bca26366fa20878397 10.0.175.235:6379
   slots:[5461-10922] (5462 slots) master
M: 10a79173d1f7a9c568bdfa3b955b6e133d2dceaa 10.0.155.10:6379
   slots:[10923-16383] (5461 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Answer 3

如果节点之间没有防火墙问题，可以检查redis.conf中的绑定设置。

你应该在局域网IP上绑定redis服务，当然，还有一点：

删除127.0.0.1或将127.0.0.1移至局域网IP后！

就像这样：bind 10.2.1.x 127.0.0.1 或 bind 10.2.1.x

source

redis-cluster - 从永远挂起的远程机器向现有集群添加节点从属

redis-cluster - add-node slave to existing cluster from remote machine hanging forever

redis