Curator 中的超时配置

Question

我创建一个 Curator 客户端如下：

    RetryPolicy retryPolicy = new RetryNTimes(3, 1000);
    CuratorFramework client = CuratorFrameworkFactory.newClient(zkConnectString, 
            15000, // sessionTimeoutMs
            15000, // connectionTimeoutMs
            retryPolicy);

当运行我的客户端程序时，我通过关闭 Curator 用来与 Zookeeper 通信的 NIC 来模拟网络分区。根据我所看到的行为，我有几个问题：

我在 10 秒后看到一条 ConnectionStateManager - State change: SUSPENDED 消息。 Curator 进入 SUSPENDED 状态之前的时间量是可配置的，基于其他超时值的百分比，还是始终为 10 秒？
在配置的 15 秒会话超时后，我没有收到任何通知，自上次成功检测信号以来。我 do 在日志中看到了一条 ZooKeeper - Session: 0x14adf3f01ef0001 closed 消息，但是这似乎并没有作为我可以捕获或监听的事件逐渐出现。我在这里遗漏了什么吗？
我最终在连接丢失将近两分钟后收到一条 ConnectionStateManager - State change: LOST 消息。为什么这么久？
如果我的目标是使用 InterProcessMutex 作为防止 HA 场景中的裂脑的手段，那么最安全的方法似乎是让锁持有者在 SUSPENDED 收到消息，因为 Zookeeper 完全有可能释放了锁它不知道在网络分区的另一侧。这是 typical/sane 方法吗？

Answer 1

正确。假设在 SUSPEND 和 LOST 上失去了领导地位。这就是 Apache Curator 食谱的工作方式。您可能希望使用 Apache Curator 而不是实现您自己的算法。 https://curator.apache.org/curator-recipes/index.html

Answer 2

这取决于您使用的是哪个版本的 Curator（注意：我是 Curator 的主要作者）...

在 Curator 2.x 中，LOST 状态表示重试策略已用尽。这并不意味着会话已经丢失。在 ZooKeeper 中，只有在修复了与整体的连接后，才会确定会话丢失。因此，当 Curator 看到第一条 "Disconnected" 消息时，您将被暂停。然后，当操作因重试策略放弃而失败时，您就会迷路。

在 Curator 3.x 中，LOST 的含义已更改。在 3.x 中，当收到 "Disconnected" 时，Curator 启动内部计时器。当计时器超过协商的会话超时时，Curator 调用 getTestable().injectSessionExpiration() 并发布 LOST 状态更改。

Answer 3

第一个问题，Zookeeper 有一个叫做 MAX_SEND_PING_INTERVAL 的变量，它是 10 秒，所以你的 condition.The 代码在 ClientCnxn class 中总是 10 秒。

//1000(1 second) is to prevent race condition missing to send the second ping
//also make sure not to send too many pings when readTimeout is small 
int timeToNextPing = readTimeout / 2 - clientCnxnSocket.getIdleSend() - 
        ((clientCnxnSocket.getIdleSend() > 1000) ? 1000 : 0);
//send a ping request either time is due or no packet sent out within MAX_SEND_PING_INTERVAL
if (timeToNextPing <= 0 || clientCnxnSocket.getIdleSend() > MAX_SEND_PING_INTERVAL) {
    sendPing();
    clientCnxnSocket.updateLastSend();
} else {
    if (timeToNextPing < to) {
        to = timeToNextPing;
    }
}

Curator 中的超时配置

Timeout configurations in Curator

apache-zookeeper

apache-curator