Ignite Java 瘦客户端 - 当一个节点关闭时连接失败

Ignite Java Thin Client - Connection fails when one node is down

我们有一个包含 3 个节点的 ignite 集群,所有服务都使用 java 瘦客户端连接到集群。

当其中一个服务器节点出现故障并且服务正在尝试连接时,很少有人连接成功,很少有人失败并出现 ignite cluster unavailable 错误。于是调试源码发现,在构建ReliableChannel对象时,会随机选择一个节点进行连接,如果该节点不可用,则会抛出客户端连接异常。

理想情况下,我们希望它回退到其他节点,因为集群中有其他节点可用。我们看到上面提到的逻辑是在 ReliableChannel class.

的服务方法中实现的

是否有任何特定原因不在对象构造期间实施回退而仅在服务方法(连接到其他节点的任何选项)上实施回退?

另外,我们是否可以控制节点连接的顺序?

ReliableChannel 代码片段

ReliableChannel(
        Function<ClientChannelConfiguration, Result<ClientChannel>> chFactory,
        ClientConfiguration clientCfg
    ) throws ClientException {
        if (chFactory == null)
            throw new NullPointerException("chFactory");

        if (clientCfg == null)
            throw new NullPointerException("clientCfg");

        this.chFactory = chFactory;
        this.clientCfg = clientCfg;

        List<InetSocketAddress> addrs = parseAddresses(clientCfg.getAddresses());

        primary = addrs.get(new Random().nextInt(addrs.size())); // we already verified there is at least one address

        ch = chFactory.apply(new ClientChannelConfiguration(clientCfg).setAddress(primary)).get();

        for (InetSocketAddress a : addrs)
            if (a != primary)
                this.backups.add(a);
    }


    public <T> T service(
        ClientOperation op,
        Consumer<BinaryOutputStream> payloadWriter,
        Function<BinaryInputStream, T> payloadReader
    ) throws ClientException {
        ClientConnectionException failure = null;

        T res = null;

        int totalSrvs = 1 + backups.size();

        svcLock.lock();
        try {
            for (int i = 0; i < totalSrvs; i++) {
                try {
                    if (failure != null)
                        changeServer();

                    if (ch == null)
                        ch = chFactory.apply(new ClientChannelConfiguration(clientCfg).setAddress(primary)).get();

                    long id = ch.send(op, payloadWriter);

                    res = ch.receive(op, id, payloadReader);

                    failure = null;

                    break;
                }
                catch (ClientConnectionException e) {
                    if (failure == null)
                        failure = e;
                    else
                        failure.addSuppressed(e);
                }
            }
        }
        finally {
            svcLock.unlock();
        }

        if (failure != null)
            throw failure;

        return res;
    }

这个问题将在 Apache Ignite 2.8 中修复:IGNITE-11599

也许它已经在支持此类修复的 GridGain 中修复了。