为什么通过redis-ha在k8s上集群不起作用?

Why clustering on k8s through redis-ha doesn't work?

我正在尝试与 Node.JS (ioredis/cluster) 一起创建 Redis 集群,但这似乎不起作用。

在 GKE 上是 v1.11.8-gke.6。

我完全按照 ha-redis 文档中的说明进行操作:

 ~  helm install --set replicas=3 --name redis-test stable/redis-ha  
NAME:   redis-test
LAST DEPLOYED: Fri Apr 26 00:13:31 2019
NAMESPACE: yt
STATUS: DEPLOYED

RESOURCES:
==> v1/ConfigMap
NAME                           DATA  AGE
redis-test-redis-ha-configmap  3     0s
redis-test-redis-ha-probes     2     0s

==> v1/Pod(related)
NAME                          READY  STATUS    RESTARTS  AGE
redis-test-redis-ha-server-0  0/2    Init:0/1  0         0s

==> v1/Role
NAME                 AGE
redis-test-redis-ha  0s

==> v1/RoleBinding
NAME                 AGE
redis-test-redis-ha  0s

==> v1/Service
NAME                            TYPE       CLUSTER-IP   EXTERNAL-IP  PORT(S)             AGE
redis-test-redis-ha             ClusterIP  None         <none>       6379/TCP,26379/TCP  0s
redis-test-redis-ha-announce-0  ClusterIP  10.7.244.34  <none>       6379/TCP,26379/TCP  0s
redis-test-redis-ha-announce-1  ClusterIP  10.7.251.35  <none>       6379/TCP,26379/TCP  0s
redis-test-redis-ha-announce-2  ClusterIP  10.7.252.94  <none>       6379/TCP,26379/TCP  0s

==> v1/ServiceAccount
NAME                 SECRETS  AGE
redis-test-redis-ha  1        0s

==> v1/StatefulSet
NAME                        READY  AGE
redis-test-redis-ha-server  0/3    0s


NOTES:
Redis can be accessed via port 6379 and Sentinel can be accessed via port 26379 on the following DNS name from within your cluster:
redis-test-redis-ha.yt.svc.cluster.local

To connect to your Redis server:
1. Run a Redis pod that you can use as a client:

   kubectl exec -it redis-test-redis-ha-server-0 sh -n yt

2. Connect using the Redis CLI:

  redis-cli -h redis-test-redis-ha.yt.svc.cluster.local

 ~  k get pods | grep redis-test                                         
redis-test-redis-ha-server-0           2/2       Running   0          1m
redis-test-redis-ha-server-1           2/2       Running   0          1m
redis-test-redis-ha-server-2           2/2       Running   0          54s
 ~  kubectl exec -it redis-test-redis-ha-server-0 sh -n yt
Defaulting container name to redis.
Use 'kubectl describe pod/redis-test-redis-ha-server-0 -n yt' to see all of the containers in this pod.
/data $ redis-cli -h redis-test-redis-ha.yt.svc.cluster.local
redis-test-redis-ha.yt.svc.cluster.local:6379> set test key
(error) READONLY You can't write against a read only replica.

但最后只有一个 运行我连接的 dom pod 是可写的。我 运行 登录了几个容器,那里的一切似乎都很好。我试图在 redis-cli 中 运行 cluster info 但我到处都是 ERR This instance has cluster support disabled

日志:

 ~  k logs pod/redis-test-redis-ha-server-0  redis
1:C 25 Apr 2019 20:13:43.604 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:13:43.604 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:13:43.604 # Configuration loaded
1:M 25 Apr 2019 20:13:43.606 * Running mode=standalone, port=6379.
1:M 25 Apr 2019 20:13:43.606 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 25 Apr 2019 20:13:43.606 # Server initialized
1:M 25 Apr 2019 20:13:43.606 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 25 Apr 2019 20:13:43.627 * DB loaded from disk: 0.021 seconds
1:M 25 Apr 2019 20:13:43.627 * Ready to accept connections
1:M 25 Apr 2019 20:14:11.801 * Replica 10.7.251.35:6379 asks for synchronization
1:M 25 Apr 2019 20:14:11.801 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are '8311a1ca896e97d5487c07f2adfd7d4ef924f36b' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:11.802 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:17.825 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:17.825 * Background RDB transfer started by pid 55
55:C 25 Apr 2019 20:14:17.826 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:17.926 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:17.926 # Slave 10.7.251.35:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:17.926 * Streamed RDB transfer with replica 10.7.251.35:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:18.828 * Synchronization with replica 10.7.251.35:6379 succeeded
1:M 25 Apr 2019 20:14:42.711 * Replica 10.7.252.94:6379 asks for synchronization
1:M 25 Apr 2019 20:14:42.711 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c2827ffe011d774db005a44165bac67a7e7f7d85', my replication IDs are 'af453adde824b2280ba66adb40cc765bf390e237' and '0000000000000000000000000000000000000000')
1:M 25 Apr 2019 20:14:42.711 * Delay next BGSAVE for diskless SYNC
1:M 25 Apr 2019 20:14:48.976 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 25 Apr 2019 20:14:48.977 * Background RDB transfer started by pid 125
125:C 25 Apr 2019 20:14:48.978 * RDB: 0 MB of memory used by copy-on-write
1:M 25 Apr 2019 20:14:49.077 * Background RDB transfer terminated with success
1:M 25 Apr 2019 20:14:49.077 # Slave 10.7.252.94:6379 correctly received the streamed RDB file.
1:M 25 Apr 2019 20:14:49.077 * Streamed RDB transfer with replica 10.7.252.94:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 25 Apr 2019 20:14:49.761 * Synchronization with replica 10.7.252.94:6379 succeeded
 ~  k logs pod/redis-test-redis-ha-server-1 redis 
1:C 25 Apr 2019 20:14:11.780 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Apr 2019 20:14:11.781 # Redis version=5.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Apr 2019 20:14:11.781 # Configuration loaded
1:S 25 Apr 2019 20:14:11.786 * Running mode=standalone, port=6379.
1:S 25 Apr 2019 20:14:11.791 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 25 Apr 2019 20:14:11.791 # Server initialized
1:S 25 Apr 2019 20:14:11.791 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 25 Apr 2019 20:14:11.792 * DB loaded from disk: 0.001 seconds
1:S 25 Apr 2019 20:14:11.792 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 25 Apr 2019 20:14:11.792 * Ready to accept connections
1:S 25 Apr 2019 20:14:11.792 * Connecting to MASTER 10.7.244.34:6379
1:S 25 Apr 2019 20:14:11.792 * MASTER <-> REPLICA sync started
1:S 25 Apr 2019 20:14:11.792 * Non blocking connect for SYNC fired the event.
1:S 25 Apr 2019 20:14:11.793 * Master replied to PING, replication can continue...
1:S 25 Apr 2019 20:14:11.799 * Trying a partial resynchronization (request c2827ffe011d774db005a44165bac67a7e7f7d85:6006176).
1:S 25 Apr 2019 20:14:17.824 * Full resync from master: af453adde824b2280ba66adb40cc765bf390e237:722
1:S 25 Apr 2019 20:14:17.824 * Discarding previously cached master state.
1:S 25 Apr 2019 20:14:17.852 * MASTER <-> REPLICA sync: receiving streamed RDB from master
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Flushing old data
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 25 Apr 2019 20:14:17.853 * MASTER <-> REPLICA sync: Finished with success

我错过了什么或者有更好的聚类方法吗?

不是最好的解决方案,但我认为我可以只使用 Sentinel 而不是寻找其他方法(或者可能没有其他方法)。它支持大多数语言,所以应该不会很难(除了 redis-cli,不知道如何查询 Sentinel 服务器)。

这就是我在 ioredis 上完成的方法(node.js,如果您不熟悉 ES6 语法,抱歉):

import * as IORedis from 'ioredis';
import Redis from 'ioredis';
import { redisHost, redisPassword, redisPort } from './config';

export function getRedisConfig(): IORedis.RedisOptions {
  // I'm not sure how to set this properly
  // ioredis/cluster automatically resolves all pods by hostname, but not this.
  // So I have to implicitly specify all pods.
  // Or resolve them all by hostname
  return {
    sentinels: process.env.REDIS_CLUSTER.split(',').map(d => {
      const [host, port = 26379] = d.split(':');

      return { host, port: Number(port) };
    }),
    name: process.env.REDIS_MASTER_NAME || 'mymaster',
    ...(redisPassword ? { password: redisPassword } : {}),
  };
}

export async function initializeRedis() {
  if (process.env.REDIS_CLUSTER) {
    const cluster = new Redis(getRedisConfig());

    return cluster;
  }

  // For dev environment
  const client = new Redis(redisPort, redisHost);

  if (redisPassword) {
    await client.auth(redisPassword);
  }

  return client;
}

在环境中:

env:
  - name: REDIS_CLUSTER
    value: redis-redis-ha-server-1.redis-redis-ha.yt.svc.cluster.local:26379,redis-redis-ha-server-0.redis-redis-ha.yt.svc.cluster.local:23679,redis-redis-ha-server-2.redis-redis-ha.yt.svc.cluster.local:23679

您可能想使用密码保护它。