2 个不同 docker 主机中的 Hazelcast 成员

Hazelcast members in 2 different docker hosts

场景由 2 个虚拟机创建,每个虚拟机都以 docker 和 hazelcast 成员 运行ning 作为容器。

阅读本指南https://hazelcast.com/blog/configuring-hazelcast-in-non-orchestrated-docker-environments/我可以获得方案 3 Public IP 地址、端口映射和 TCP 发现方法 每个节点使用一个成员. 但是,如果我将一个成员添加到其中一个节点,它将取代集群中的另一个成员或记录连接问题。所以我无法让集群与每个节点的多个成员一起工作。

两个节点中的配置为:

hazelcast:
  network:
    join:
      multicast:
        enabled: false
      tcp-ip:
        enabled: true
        member-list:
          - 10.132.0.2:5701
          - 10.128.0.3:5701
          - 10.128.0.3:5702

节点 10.132.0.2 中的容器是 运行 并且:

docker run -v `pwd`:/mnt --rm --name member1   -e "JAVA_OPTS=-Dhazelcast.local.
publicAddress=10.132.0.2 -Dhazelcast.config=/mnt/hazelcast.yml"   -p 5701:5701 hazelcast/hazelcast:4.0.1

########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.132.0.2 -Dhazelcast.config=/mnt/hazelcast.yml
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
+ exec java -server -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.132.0.2 -Dhazelcast.config=/mnt/hazelcast.yml com.hazelcast.core.server.HazelcastMemberStarter
Sep 29, 2020 6:35:23 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Loading configuration '/mnt/hazelcast.yml' from System property 'hazelcast.config'
Sep 29, 2020 6:35:23 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Using configuration file at /mnt/hazelcast.yml
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [10.128.0.3, 10.132.0.2]
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Prefer IPv4 stack is true, prefer IPv6 addresses is false
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
WARNING: [LOCAL] [dev] [4.0.1] Could not find a matching address to start with! Picking one of non-loopback addresses.
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Picked [172.17.0.2]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true
Sep 29, 2020 6:35:24 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Using public address: [10.132.0.2]:5701
Sep 29, 2020 6:35:24 AM com.hazelcast.system
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Hazelcast 4.0.1 (20200409 - e086b9c) starting at [10.132.0.2]:5701
Sep 29, 2020 6:35:24 AM com.hazelcast.system
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Sep 29, 2020 6:35:24 AM com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Backpressure is disabled
Sep 29, 2020 6:35:25 AM com.hazelcast.instance.impl.Node
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Creating TcpIpJoiner
Sep 29, 2020 6:35:25 AM com.hazelcast.cp.CPSubsystem
WARNING: [10.132.0.2]:5701 [dev] [4.0.1] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
Sep 29, 2020 6:35:26 AM com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
Sep 29, 2020 6:35:26 AM com.hazelcast.core.LifecycleService
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5701 is STARTING
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.128.0.3:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5703, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:26 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.128.0.3:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5702 is added to the blacklist.
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.132.0.2]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5703. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.128.0.3]:5702 is added to the blacklist.
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.128.0.3]:5701 is added to the blacklist.
Sep 29, 2020 6:35:36 AM com.hazelcast.internal.cluster.impl.TcpIpJoiner
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5703 is added to the blacklist.
Sep 29, 2020 6:35:37 AM com.hazelcast.internal.cluster.ClusterService
INFO: [10.132.0.2]:5701 [dev] [4.0.1] 

Members {size:1, ver:1} [
        Member [10.132.0.2]:5701 - 69284e57-ce61-405c-87d3-1e9ea46b2bed this
]

Sep 29, 2020 6:35:37 AM com.hazelcast.core.LifecycleService
INFO: [10.132.0.2]:5701 [dev] [4.0.1] [10.132.0.2]:5701 is STARTED

节点 10.128.0.3 中的容器 运行 具有:

docker run -v `pwd`:/mnt --rm --name member2   -e "JAVA_OPTS=-Dhazelcast.local.p
ublicAddress=10.128.0.3:5701 -Dhazelcast.config=/mnt/hazelcast.yml"   -p 5701:5701 hazelcast/hazelcast:4.0.1

########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.128.0.3:5701 -Dhazelcast.config=/mnt/hazelcast.yml
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
Sep 29, 2020 6:36:54 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Loading configuration '/mnt/hazelcast.yml' from System property 'hazelcast.config'
Sep 29, 2020 6:36:54 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Using configuration file at /mnt/hazelcast.yml
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [10.128.0.3, 10.132.0.2]
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Prefer IPv4 stack is true, prefer IPv6 addresses is false
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
WARNING: [LOCAL] [dev] [4.0.1] Could not find a matching address to start with! Picking one of non-loopback addresses.
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Picked [172.17.0.2]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind any local is true
Sep 29, 2020 6:36:55 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Using public address: [10.128.0.3]:5701
Sep 29, 2020 6:36:55 AM com.hazelcast.system
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Hazelcast 4.0.1 (20200409 - e086b9c) starting at [10.128.0.3]:5701
Sep 29, 2020 6:36:55 AM com.hazelcast.system
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Sep 29, 2020 6:36:56 AM com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Backpressure is disabled
Sep 29, 2020 6:36:56 AM com.hazelcast.instance.impl.Node
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Creating TcpIpJoiner
Sep 29, 2020 6:36:56 AM com.hazelcast.cp.CPSubsystem
WARNING: [10.128.0.3]:5701 [dev] [4.0.1] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
Sep 29, 2020 6:36:58 AM com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
Sep 29, 2020 6:36:58 AM com.hazelcast.core.LifecycleService
INFO: [10.128.0.3]:5701 [dev] [4.0.1] [10.128.0.3]:5701 is STARTING
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.128.0.3:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5703, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Connecting to /10.132.0.2:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:36:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnection
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Initialized new cluster connection between /172.17.0.2:56429 and /10.132.0.2:5701
Sep 29, 2020 6:37:05 AM com.hazelcast.internal.cluster.ClusterService
INFO: [10.128.0.3]:5701 [dev] [4.0.1] 

Members {size:2, ver:2} [
        Member [10.132.0.2]:5701 - 69284e57-ce61-405c-87d3-1e9ea46b2bed
        Member [10.128.0.3]:5701 - aa22f242-cc82-44ff-9dc1-06678d14420e this
]

Sep 29, 2020 6:37:06 AM com.hazelcast.core.LifecycleService
INFO: [10.128.0.3]:5701 [dev] [4.0.1] [10.128.0.3]:5701 is STARTED
Sep 29, 2020 6:37:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:37:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Could not connect to: /10.128.0.3:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:37:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5701 [dev] [4.0.1] Could not connect to: /10.132.0.2:5703. Reason: SocketTimeoutException[null]

到目前为止一切正常,但是当我启动成员 3 时:

docker run -v `pwd`:/mnt --rm --name member3   -e "JAVA_OPTS=-Dhazelcast.local.p
ublicAddress=10.128.0.3:5702 -Dhazelcast.config=/mnt/hazelcast.yml"   -p 5702:5701 hazelcast/hazelcast:4.0.1

########################################
# JAVA_OPTS=-Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properties -XX:M
axRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED 
--add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.ni
o.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.manageme
nt.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.128.0.3:5702 -Dhazelcast.config=/mnt/hazelcast.yml
# CLASSPATH=/opt/hazelcast/*:/opt/hazelcast/lib/*
# starting now....
########################################
+ exec java -server -Djava.net.preferIPv4Stack=true -Djava.util.logging.config.file=/opt/hazelcast/logging.properti
es -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC --add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-
UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.bas
e/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.
management.internal=ALL-UNNAMED -Dhazelcast.local.publicAddress=10.128.0.3:5702 -Dhazelcast.config=/mnt/hazelcast.y
ml com.hazelcast.core.server.HazelcastMemberStarter
Sep 29, 2020 6:38:26 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Loading configuration '/mnt/hazelcast.yml' from System property 'hazelcast.config'
Sep 29, 2020 6:38:26 AM com.hazelcast.internal.config.AbstractConfigLocator
INFO: Using configuration file at /mnt/hazelcast.yml
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [10.12
8.0.3, 10.132.0.2]
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Prefer IPv4 stack is true, prefer IPv6 addresses is false
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
WARNING: [LOCAL] [dev] [4.0.1] Could not find a matching address to start with! Picking one of non-loopback address
es.
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Picked [172.17.0.3]:5701, using socket ServerSocket[addr=/0.0.0.0,localport=5701], bind
 any local is true
Sep 29, 2020 6:38:26 AM com.hazelcast.instance.AddressPicker
INFO: [LOCAL] [dev] [4.0.1] Using public address: [10.128.0.3]:5702
Sep 29, 2020 6:38:26 AM com.hazelcast.system
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Hazelcast 4.0.1 (20200409 - e086b9c) starting at [10.128.0.3]:5702
Sep 29, 2020 6:38:26 AM com.hazelcast.system
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
Sep 29, 2020 6:38:27 AM com.hazelcast.spi.impl.operationservice.impl.BackpressureRegulator
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Backpressure is disabled
Sep 29, 2020 6:38:27 AM com.hazelcast.instance.impl.Node
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Creating TcpIpJoiner
Sep 29, 2020 6:38:27 AM com.hazelcast.cp.CPSubsystem
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mod
e! Please note that UNSAFE mode will not provide strong consistency guarantees.
Sep 29, 2020 6:38:27 AM com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Starting 2 partition threads and 3 generic threads (1 dedicated for priority 
tasks)
Sep 29, 2020 6:38:27 AM com.hazelcast.internal.diagnostics.Diagnostics
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to t
he JVM arguments.
Sep 29, 2020 6:38:27 AM com.hazelcast.core.LifecycleService
INFO: [10.128.0.3]:5702 [dev] [4.0.1] [10.128.0.3]:5702 is STARTING
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.132.0.2:5703, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.132.0.2:5702, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.132.0.2:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnection
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Initialized new cluster connection between /172.17.0.3:52951 and /10.132.0.2:
5701
Sep 29, 2020 6:38:35 AM com.hazelcast.internal.cluster.ClusterService
INFO: [10.128.0.3]:5702 [dev] [4.0.1] 
Members {size:3, ver:3} [
        Member [10.132.0.2]:5701 - 69284e57-ce61-405c-87d3-1e9ea46b2bed
        Member [10.128.0.3]:5701 - aa22f242-cc82-44ff-9dc1-06678d14420e
        Member [10.128.0.3]:5702 - 0dd31ea2-db2e-4e43-941a-98592e222817 this
]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.132.0.2:5703. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.132.0.2:5702. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:38:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:38:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:08 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 5
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.cluster.impl.MembershipManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Member [10.128.0.3]:5701 - aa22f242-cc82-44ff-9dc1-06678d14420e is suspect
ed to be dead for reason: No connection
Sep 29, 2020 6:39:18 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:27 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:28 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 6
Sep 29, 2020 6:39:32 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:37 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:38 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:42 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:47 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 7
Sep 29, 2020 6:39:48 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Connecting to /10.128.0.3:5701, timeout: 10000, bind-any: true
Sep 29, 2020 6:39:52 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:57 AM com.hazelcast.internal.cluster.impl.ClusterHeartbeatManager
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] This node does not have a connection to Member [10.128.0.3]:5701 - aa22f24
2-cc82-44ff-9dc1-06678d14420e
Sep 29, 2020 6:39:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnector
INFO: [10.128.0.3]:5702 [dev] [4.0.1] Could not connect to: /10.128.0.3:5701. Reason: SocketTimeoutException[null]
Sep 29, 2020 6:39:58 AM com.hazelcast.internal.nio.tcp.TcpIpConnectionErrorHandler
WARNING: [10.128.0.3]:5702 [dev] [4.0.1] Removing connection to endpoint [10.128.0.3]:5701 Cause => java.net.Socket
TimeoutException {null}, Error-Count: 8

同节点的成员之间似乎存在通信问题 在另一个测试中,member3 在集群中替换了 member2,并将来自 node2 的连接尝试标记为可疑

虚拟机在 GCP 上是全新的,并且在同一个网络上,我使用了这个图像:

Google, Container-Optimized OS, 85-13310.1041.9 stable, Kernel: ChromiumOS-5.4.49 Kubernetes: 1.18.9 Docker: 19.03.9 Family: cos-stable, supports Shielded VM features, supports Confidential VM features on N2D

您的场景应该可行,我猜这将是一些环境问题。您可以尝试以下两件事吗?

也在 member1 上明确设置端口

也在 member1 配置中使用 :5701 - member-list (hazelcast.yml) 和 hazelcast.local.publicAddress 属性 (command-line)。

此步骤可能不会对您的问题有任何改变,但它至少应该避免 non-related 日志中的警告。

尝试从 member3 是否可以访问 member2

启动节点后,在 member3 中执行交互式 shell 并尝试向 member2 的套接字地址发送协议质询 (HZC)。如果您在终端中看到协议响应(HZC),则容器之间的通信正常。如果您没有看到响应,请检查您的 Docker 和防火墙配置,看看是什么导致了问题。

docker exec -it member3 /bin/bash

# Following command runs in container.
# The first HZC line is the one you type (followed by Enter).
# The second is the reply from member2.

bash-5.0# nc 10.128.0.3 5701
HZC
HZC

如果您在终端中看到正确的协议响应,我们将需要进一步调查 Hazelcast 行为。我无法在我的环境中重现该问题。

问题是由 member3 认为在端口 5701 而不是 5702 上引起的。 解决方案是在配置中指定成员将在 docker 主机

上侦听的端口

member3 的配置是

hazelcast:
  network:
    port:
      port: 5702
    join:
      multicast:
        enabled: false
      tcp-ip:
        enabled: true
        member-list:
          - 10.132.0.2:5701
          - 10.128.0.3:5701
          - 10.128.0.3:5702

集群以这种方式工作,每个成员都可以与其他人通信。

如果您想更进一步,Hazelcast 支持对容器和编排管理器(即 K8s 和 OpenShift)的生命周期调用,为此您可以使用该特定平台的 hazelcast 网络发现模块。这使您从 hard-coding 地址和端口中解放出来,并允许在运行时动态完成这些分配。