如何获得多节点 Keycloak 集群 运行 docker 容器(无 k8/swarm/etc)?

How do I get a multi-node Keycloak cluster running with docker containers (no k8/swarm/etc)?

我在 AWS 中有三个 EC2 实例:

全部在同一个 VPC 中。实例B和C在不同子网(不同可用区),但可以通过80和7600端口相互通信。

docker 实例使用以下命令启动时没有问题:

  docker run \
  --name test-node-1 \
  -e DB_PORT=3306 \
  -e PROXY_ADDRESS_FORWARDING=true \
  -e DB_VENDOR=mysql \
  -e DB_DATABASE=keycloak \
  -e DB_ADDR=foo.us-east-1.rds.amazonaws.com \
  -e KEYCLOAK_STATISTICS=all \
  -e DB_USER=keycloak \
  -e KEYCLOAK_USER=kcuser \
  -e DB_PASSWORD=... \
  -e KEYCLOAK_PASSWORD=... \
  -p 80:8080 \
  -p 7600:7600 \
  jboss/keycloak:16.1.0

两个容器都可以正常启动,但它们没有相互通信。

添加以下三个环境变量:

  -e JGROUPS_DISCOVERY_EXTERNAL_IP=1.2.3.5 \
  -e JGROUPS_DISCOVERY_PROTOCOL=TCPPING \
  -e JGROUPS_DISCOVERY_PROPERTIES='1.2.3.5[7600],1.2.3.6[7600]' \

导致 Keycloak 在启动时崩溃:

=========================================================================

  Using MySQL database

=========================================================================

17:01:35,028 INFO  [org.jboss.modules] (CLI command executor) JBoss Modules version 2.0.0.Final
17:01:35,124 INFO  [org.jboss.msc] (CLI command executor) JBoss MSC version 1.4.13.Final
17:01:35,134 INFO  [org.jboss.threads] (CLI command executor) JBoss Threads version 2.4.0.Final
17:01:35,267 INFO  [org.jboss.as] (MSC service thread 1-2) WFLYSRV0049: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) starting
...
17:01:43,320 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
17:01:43,322 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) started in 3261ms - Started 49 of 79 services (31 services are lazy, passive or on-demand)
The batch executed successfully
17:01:43,560 INFO  [org.jboss.as] (MSC service thread 1-1) WFLYSRV0050: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) stopped in 21ms
Setting JGroups discovery to TCPPING with properties {1.2.3.5[7600],1.2.3.6[7600]}

最后一行日志挂起几秒钟,然后进程崩溃。请注意,崩溃的是 FIRST 实例(我从来没有启动过第二个实例),所以我认为这不是 communication/firewall/etc 的问题,而是端口 80 和 7600已开放。

我正在使用 jboss/Keycloak docker image v16.1 from Docker Hub

容器将需要一个 TCPPING.cli 脚本,或者对独立的 ha.xml 进行适当的修改。以下 TCPPING.cli 文件对我有用(使用 -v $(pwd)/TCPPING.cli:/opt/jboss/tools/cli/jgroups/discovery/TCPPING.cli 安装到 docker 容器中):

embed-server --server-config=standalone-ha.xml --std-out=echo
batch

/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})

/subsystem=jgroups/stack=udp:remove()

/subsystem=jgroups/stack=tcp/protocol=MPING:remove()
/subsystem=jgroups/stack=tcp/protocol=$keycloak_jgroups_discovery_protocol:add(add-index=0, properties={"initial_hosts"=>$keycloak_jgroups_discovery_protocol_properties})

/subsystem=jgroups/channel=ee:write-attribute(name=stack, value="tcp")

/subsystem=jgroups/stack=tcp/transport=TCP/property=external_addr/:add(value=${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1})

run-batch
stop-embedded-server

请注意,这与 https://www.keycloak.org/2019/05/keycloak-cluster-setup 中的建议不同 - 特别是行

/subsystem=jgroups/stack=tcp/protocol=$keycloak_jgroups_discovery_protocol:add(add-index=0, properties={"initial_hosts"=>$keycloak_jgroups_discovery_protocol_properties})

我还将 JGROUPS_DISCOVERY_PROPERTIES env var 更改为仅作为第一个服务器(例如 -e JGROUPS_DISCOVERY_PROPERTIES=1.2.3.5[7600])- 集群中的每个服务器只需要与主服务器核实才能加入。