两个 WAS 节点上的 Hazelcast 缓存分布问题

Hazelcast cache distribution issue on two WAS nodes

在我的项目中,我使用的是 Hazelcast 3.7.8,我遇到了应用程序和节点之间的数据分布问题。

我有 2 个节点,对于每个节点,我有 4 个 Spring 应用程序部署在具有单个 JVM 进程的 WAS 上。

这些应用程序在它们之间共享一个地图。每个应用程序都有一个 hazelcast-configuration.xml 文件,但所有文件都是一样的,除了网络端口(5701、5702、5703、5704)。

通常但并非总是如此,在每个节点上同时部署其中一个应用程序后,分布式数据并不相同。部署的应用程序(在每个节点上)有一个数据集,另一个应用程序有另一个。

        <cache:annotation-driven cache-manager="cacheManager" />
        <bean id="cacheManager" class="com.hazelcast.spring.cache.HazelcastCacheManager">
            <constructor-arg ref="hazelcastInstance" />
        </bean>  
        <hz:hazelcast id="hazelcastInstance">
            <hz:config>
                <hz:instance-name>myCacheInstance</hz:instance-name>
                <hz:group name="qualification" password="qualification"/>
                <hz:properties>
                    <hz:property name="hazelcast.health.monitoring.level">OFF</hz:property>
                    <hz:property name="hazelcast.health.monitoring.delay.seconds">3600</hz:property>
                </hz:properties>
                <hz:network port="5701" port-auto-increment="true">
                    <hz:join>
                        <hz:multicast enabled="false" />
                        <hz:tcp-ip enabled="true">
                            <hz:member>NODE1</hz:member>
                            <hz:member>NODE2</hz:member>
                        </hz:tcp-ip>
                    </hz:join>
                </hz:network>
                <hz:partition-group enabled="false"/>
                <hz:map name="my-map" 
                    backup-count="1"
                    async-backup-count="1"
                    time-to-live-seconds="7200"
                    max-idle-seconds="0"
                    eviction-policy="LRU"
                    max-size="15"
                    max-size-policy="USED_HEAP_PERCENTAGE"
                    eviction-percentage="25"
                    min-eviction-check-millis="100"
                    merge-policy="com.hazelcast.map.merge.PassThroughMergePolicy">
                </hz:map>
                <hz:services enable-defaults="true"/>   
            </hz:config>
        </hz:hazelcast>  
[LOCAL] [qualification] [3.7.8] You configured your member address as host name. Please be aware of that your dns can be spoofed. Make sure that your dns configurations are correct. 
[LOCAL] [qualification] [3.7.8] Resolving domain name 'NODE1' to address(es): [192.237.154.88] 
[LOCAL] [qualification] [3.7.8] You configured your member address as host name. Please be aware of that your dns can be spoofed. Make sure that your dns configurations are correct. 
[LOCAL] [qualification] [3.7.8] Resolving domain name 'NODE2' to address(es): [192.237.155.244] 
[LOCAL] [qualification] [3.7.8] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [NODE2/192.237.155.244, NODE1/192.237.154.88] 
[LOCAL] [qualification] [3.7.8] Prefer IPv4 stack is true. 
[LOCAL] [qualification] [3.7.8] Picked [NODE2]:5705, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5705], bind any local is true [NODE2]:5705 [qualification] [3.7.8] Hazelcast 3.7.8 (20170525 - 4e820fa) starting at [NODE2]:5705 [NODE2]:5705 
[qualification] [3.7.8] Copyright (c) 2008-2016, Hazelcast, Inc. All Rights Reserved. [NODE2]:5705 
[qualification] [3.7.8] Configured Hazelcast Serialization version : 1 [NODE2]:5705 
[qualification] [3.7.8] Backpressure is disabled [NODE2]:5705 
[qualification] [3.7.8] Creating TcpIpJoiner [NODE2]:5705 
[qualification] [3.7.8] Starting 8 partition threads [NODE2]:5705 [qualification] [3.7.8] Starting 5 generic threads (1 dedicated for priority tasks) [NODE2]:5705 
[qualification] [3.7.8] [NODE2]:5705 is STARTING [NODE2]:5705 [qualification] [3.7.8] TcpIpConnectionManager configured with Non Blocking IO-threading model: 3 input threads and 3 output threads [NODE2]:5705 
[qualification] [3.7.8] Connecting to NODE1/192.237.154.88:5703, timeout: 0, bind-any: true [NODE2]:5705 [qualification] [3.7.8] Connecting to NODE1/192.237.154.88:5704, timeout: 0, bind-any: true [NODE2]:5705 
[qualification] [3.7.8] Connecting to NODE2/192.237.155.244:5703, timeout: 0, bind-any: true [NODE2]:5705 
[qualification] [3.7.8] Connecting to NODE1/192.237.154.88:5705, timeout: 0, bind-any: true [192.237.155.244]:5703 
[dev] [3.7.8] Accepting socket connection from /192.237.155.244:37105 [NODE2]:5705 
[qualification] [3.7.8] Connecting to NODE2/192.237.155.244:5704, timeout: 0, bind-any: true [192.237.155.244]:5703 
[dev] [3.7.8] Established socket connection between /192.237.155.244:5703 and /192.237.155.244:37105 [NODE2]:5704 
[qualification] [3.7.8] Accepting socket connection from /192.237.155.244:50221 [NODE2]:5704 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:5704 and /192.237.155.244:50221 [NODE2]:5705 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:37105 and NODE2/192.237.155.244:5703 [NODE2]:5705 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:49809 and NODE1/192.237.154.88:5704 [NODE2]:5705 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:37358 and NODE1/192.237.154.88:5703 [NODE2]:5705 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:50221 and NODE2/192.237.155.244:5704 [NODE2]:5705 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:45740 and NODE1/192.237.154.88:5705 [192.237.155.244]:5703 
[dev] [3.7.8] Wrong bind request from [NODE2]:5705! This node is not requested endpoint: [NODE2]:5703 [192.237.155.244]:5703 
[dev] [3.7.8] Connection[id=2, /192.237.155.244:5703->/192.237.155.244:37105, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [NODE2]:5705! This node is not requested endpoint: [NODE2]:5703 [NODE2]:5705 
[qualification] [3.7.8] Connection[id=2, /192.237.155.244:49809->NODE1/192.237.154.88:5704, endpoint=[NODE1]:5704, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side [NODE2]:5705 
[qualification] [3.7.8] Connection[id=1, /192.237.155.244:37105->NODE2/192.237.155.244:5703, endpoint=[NODE2]:5703, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side [NODE2]:5705 
[qualification] [3.7.8] Connecting to NODE1/192.237.154.88:5704, timeout: 0, bind-any: true [NODE2]:5705 
[qualification] [3.7.8] Connecting to NODE2/192.237.155.244:5703, timeout: 0, bind-any: true [192.237.155.244]:5703 
[dev] [3.7.8] Accepting socket connection from /192.237.155.244:59036 [NODE2]:5705 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:59036 and NODE2/192.237.155.244:5703 [NODE2]:5705 
[qualification] [3.7.8] Established socket connection between /192.237.155.244:33775 and NODE1/192.237.154.88:5704 [192.237.155.244]:5703 
[dev] [3.7.8] Established socket connection between /192.237.155.244:5703 and /192.237.155.244:59036 [192.237.155.244]:5703 
[dev] [3.7.8] Wrong bind request from [NODE2]:5705! This node is not requested endpoint: [NODE2]:5703 [192.237.155.244]:5703 
[dev] [3.7.8] Connection[id=3, /192.237.155.244:5703->/192.237.155.244:59036, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [NODE2]:5705! This node is not requested endpoint: [NODE2]:5703 [NODE2]:5705 
[qualification] [3.7.8] Connection[id=6, /192.237.155.244:59036->NODE2/192.237.155.244:5703, endpoint=[NODE2]:5703, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side [NODE2]:5705 
[qualification] [3.7.8] Connection[id=7, /192.237.155.244:33775->NODE1/192.237.154.88:5704, endpoint=[NODE1]:5704, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side [NODE2]:5705 
[qualification] [3.7.8] Ignoring master response [NODE1]:5703 from [NODE1]:5703 since this node has an active master [NODE2]:5704 [NODE2]:5705 
[qualification] [3.7.8] Ignoring master response [NODE1]:5703 from [NODE1]:5703 since this node has an active master [NODE2]:5704

怎么了?

提前致谢

这里有四个方面要看。

每个 Hazelcast 实例选择一个入站端口,在显示的配置中指定 port="5701" port-auto-increment="true"

这意味着当实例启动时,它将尝试使用端口 5701。如果该端口正在使用中(例如,被另一个 Hazelcast 实例使用),自动递增标志意味着尝试下一个端口 5702 ,然后是 5703,依此类推,直到找到一个可用的。

(1) 基于以上所述,您可以而且可能应该对所有 Hazelcast 实例使用相同的配置。如果它们设置正确,则不应导致上述错误,但如果它们有一些无意的差异,则可能是原因。都设置一样,看看会发生什么。

您也可以更改


                            <hz:member>NODE1</hz:member>
                            <hz:member>NODE2</hz:member>

                            <hz:member>NODE1:5701</hz:member>
                            <hz:member>NODE1:5702</hz:member>
                            <hz:member>NODE1:5703</hz:member>
                            <hz:member>NODE1:5704</hz:member>
                            <hz:member>NODE2:5701</hz:member>
                            <hz:member>NODE2:5702</hz:member>
                            <hz:member>NODE2:5703</hz:member>
                            <hz:member>NODE2:5704</hz:member>

(2) 日志行 [qualification] [3.7.8] Creating TcpIpJoiner [NODE2]:5705 表示端口 5701、5702、5703 和 5704 正在使用中。这可能意味着该节点上已经有四个 Hazelcast 实例 运行,所以这是第五个。如果您只希望有四个实例,而实际有五个,则可能是之前关闭的实例之一尚未完成。

(3)配置<hz:partition-group enabled="false"/>表示数据备份放在任何其他Hazelcast实例上,这可能意味着同一个WAS进程中的一个实例。如果该 WAS 进程失败,则数据及其备份可能会丢失。使用 HOST_AWARE 设置会更安全,但是您只有两台主机并且配置了主副本、同步备份和异步备份——总共三个副本,试图分布在 2 个主机上,其中每个副本都位于具有不同 IP 地址的主机上,因此无法实现。

(4) 日志行 [qualification] [3.7.8] Starting 8 partition threads 表明它是一台 4 CPU 机器,这不足以 运行 充分加载所有内容。

++

另外,3.7.8 是旧版本。如果你将不得不改变以带来稳定性,你也可以升级。