如何获得多节点 Keycloak 集群 运行 docker 容器(无 k8/swarm/etc)?
How do I get a multi-node Keycloak cluster running with docker containers (no k8/swarm/etc)?
我在 AWS 中有三个 EC2 实例:
- 实例 A - docker 带有 nginx 容器 - 私有 IP 地址 1.2.3.4
- 实例 B 和 C - docker 带有 keycloak 容器 - 私有 IP 地址 1.2.3.5 和 1.2.3.6
- RDS 实例 运行 MySQL 8 - 主机 foo.us-east-1.rds.amazonaws.com
全部在同一个 VPC 中。实例B和C在不同子网(不同可用区),但可以通过80和7600端口相互通信。
docker 实例使用以下命令启动时没有问题:
docker run \
--name test-node-1 \
-e DB_PORT=3306 \
-e PROXY_ADDRESS_FORWARDING=true \
-e DB_VENDOR=mysql \
-e DB_DATABASE=keycloak \
-e DB_ADDR=foo.us-east-1.rds.amazonaws.com \
-e KEYCLOAK_STATISTICS=all \
-e DB_USER=keycloak \
-e KEYCLOAK_USER=kcuser \
-e DB_PASSWORD=... \
-e KEYCLOAK_PASSWORD=... \
-p 80:8080 \
-p 7600:7600 \
jboss/keycloak:16.1.0
两个容器都可以正常启动,但它们没有相互通信。
添加以下三个环境变量:
-e JGROUPS_DISCOVERY_EXTERNAL_IP=1.2.3.5 \
-e JGROUPS_DISCOVERY_PROTOCOL=TCPPING \
-e JGROUPS_DISCOVERY_PROPERTIES='1.2.3.5[7600],1.2.3.6[7600]' \
导致 Keycloak 在启动时崩溃:
=========================================================================
Using MySQL database
=========================================================================
17:01:35,028 INFO [org.jboss.modules] (CLI command executor) JBoss Modules version 2.0.0.Final
17:01:35,124 INFO [org.jboss.msc] (CLI command executor) JBoss MSC version 1.4.13.Final
17:01:35,134 INFO [org.jboss.threads] (CLI command executor) JBoss Threads version 2.4.0.Final
17:01:35,267 INFO [org.jboss.as] (MSC service thread 1-2) WFLYSRV0049: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) starting
...
17:01:43,320 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
17:01:43,322 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) started in 3261ms - Started 49 of 79 services (31 services are lazy, passive or on-demand)
The batch executed successfully
17:01:43,560 INFO [org.jboss.as] (MSC service thread 1-1) WFLYSRV0050: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) stopped in 21ms
Setting JGroups discovery to TCPPING with properties {1.2.3.5[7600],1.2.3.6[7600]}
最后一行日志挂起几秒钟,然后进程崩溃。请注意,崩溃的是 FIRST 实例(我从来没有启动过第二个实例),所以我认为这不是 communication/firewall/etc 的问题,而是端口 80 和 7600已开放。
容器将需要一个 TCPPING.cli 脚本,或者对独立的 ha.xml 进行适当的修改。以下 TCPPING.cli 文件对我有用(使用 -v $(pwd)/TCPPING.cli:/opt/jboss/tools/cli/jgroups/discovery/TCPPING.cli
安装到 docker 容器中):
embed-server --server-config=standalone-ha.xml --std-out=echo
batch
/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=jgroups/stack=udp:remove()
/subsystem=jgroups/stack=tcp/protocol=MPING:remove()
/subsystem=jgroups/stack=tcp/protocol=$keycloak_jgroups_discovery_protocol:add(add-index=0, properties={"initial_hosts"=>$keycloak_jgroups_discovery_protocol_properties})
/subsystem=jgroups/channel=ee:write-attribute(name=stack, value="tcp")
/subsystem=jgroups/stack=tcp/transport=TCP/property=external_addr/:add(value=${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1})
run-batch
stop-embedded-server
请注意,这与 https://www.keycloak.org/2019/05/keycloak-cluster-setup 中的建议不同 - 特别是行
/subsystem=jgroups/stack=tcp/protocol=$keycloak_jgroups_discovery_protocol:add(add-index=0, properties={"initial_hosts"=>$keycloak_jgroups_discovery_protocol_properties})
我还将 JGROUPS_DISCOVERY_PROPERTIES
env var 更改为仅作为第一个服务器(例如 -e JGROUPS_DISCOVERY_PROPERTIES=1.2.3.5[7600]
)- 集群中的每个服务器只需要与主服务器核实才能加入。
我在 AWS 中有三个 EC2 实例:
- 实例 A - docker 带有 nginx 容器 - 私有 IP 地址 1.2.3.4
- 实例 B 和 C - docker 带有 keycloak 容器 - 私有 IP 地址 1.2.3.5 和 1.2.3.6
- RDS 实例 运行 MySQL 8 - 主机 foo.us-east-1.rds.amazonaws.com
全部在同一个 VPC 中。实例B和C在不同子网(不同可用区),但可以通过80和7600端口相互通信。
docker 实例使用以下命令启动时没有问题:
docker run \
--name test-node-1 \
-e DB_PORT=3306 \
-e PROXY_ADDRESS_FORWARDING=true \
-e DB_VENDOR=mysql \
-e DB_DATABASE=keycloak \
-e DB_ADDR=foo.us-east-1.rds.amazonaws.com \
-e KEYCLOAK_STATISTICS=all \
-e DB_USER=keycloak \
-e KEYCLOAK_USER=kcuser \
-e DB_PASSWORD=... \
-e KEYCLOAK_PASSWORD=... \
-p 80:8080 \
-p 7600:7600 \
jboss/keycloak:16.1.0
两个容器都可以正常启动,但它们没有相互通信。
添加以下三个环境变量:
-e JGROUPS_DISCOVERY_EXTERNAL_IP=1.2.3.5 \
-e JGROUPS_DISCOVERY_PROTOCOL=TCPPING \
-e JGROUPS_DISCOVERY_PROPERTIES='1.2.3.5[7600],1.2.3.6[7600]' \
导致 Keycloak 在启动时崩溃:
=========================================================================
Using MySQL database
=========================================================================
17:01:35,028 INFO [org.jboss.modules] (CLI command executor) JBoss Modules version 2.0.0.Final
17:01:35,124 INFO [org.jboss.msc] (CLI command executor) JBoss MSC version 1.4.13.Final
17:01:35,134 INFO [org.jboss.threads] (CLI command executor) JBoss Threads version 2.4.0.Final
17:01:35,267 INFO [org.jboss.as] (MSC service thread 1-2) WFLYSRV0049: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) starting
...
17:01:43,320 INFO [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
17:01:43,322 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) started in 3261ms - Started 49 of 79 services (31 services are lazy, passive or on-demand)
The batch executed successfully
17:01:43,560 INFO [org.jboss.as] (MSC service thread 1-1) WFLYSRV0050: Keycloak 16.1.0 (WildFly Core 18.0.0.Final) stopped in 21ms
Setting JGroups discovery to TCPPING with properties {1.2.3.5[7600],1.2.3.6[7600]}
最后一行日志挂起几秒钟,然后进程崩溃。请注意,崩溃的是 FIRST 实例(我从来没有启动过第二个实例),所以我认为这不是 communication/firewall/etc 的问题,而是端口 80 和 7600已开放。
容器将需要一个 TCPPING.cli 脚本,或者对独立的 ha.xml 进行适当的修改。以下 TCPPING.cli 文件对我有用(使用 -v $(pwd)/TCPPING.cli:/opt/jboss/tools/cli/jgroups/discovery/TCPPING.cli
安装到 docker 容器中):
embed-server --server-config=standalone-ha.xml --std-out=echo
batch
/subsystem=infinispan/cache-container=keycloak/distributed-cache=sessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=authenticationSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=offlineSessions:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=infinispan/cache-container=keycloak/distributed-cache=loginFailures:write-attribute(name=owners, value=${env.CACHE_OWNERS:2})
/subsystem=jgroups/stack=udp:remove()
/subsystem=jgroups/stack=tcp/protocol=MPING:remove()
/subsystem=jgroups/stack=tcp/protocol=$keycloak_jgroups_discovery_protocol:add(add-index=0, properties={"initial_hosts"=>$keycloak_jgroups_discovery_protocol_properties})
/subsystem=jgroups/channel=ee:write-attribute(name=stack, value="tcp")
/subsystem=jgroups/stack=tcp/transport=TCP/property=external_addr/:add(value=${env.JGROUPS_DISCOVERY_EXTERNAL_IP:127.0.0.1})
run-batch
stop-embedded-server
请注意,这与 https://www.keycloak.org/2019/05/keycloak-cluster-setup 中的建议不同 - 特别是行
/subsystem=jgroups/stack=tcp/protocol=$keycloak_jgroups_discovery_protocol:add(add-index=0, properties={"initial_hosts"=>$keycloak_jgroups_discovery_protocol_properties})
我还将 JGROUPS_DISCOVERY_PROPERTIES
env var 更改为仅作为第一个服务器(例如 -e JGROUPS_DISCOVERY_PROPERTIES=1.2.3.5[7600]
)- 集群中的每个服务器只需要与主服务器核实才能加入。