Ignite Java 瘦客户端 - 当一个节点关闭时连接失败
Ignite Java Thin Client - Connection fails when one node is down
我们有一个包含 3 个节点的 ignite 集群,所有服务都使用 java 瘦客户端连接到集群。
当其中一个服务器节点出现故障并且服务正在尝试连接时,很少有人连接成功,很少有人失败并出现 ignite cluster unavailable 错误。于是调试源码发现,在构建ReliableChannel对象时,会随机选择一个节点进行连接,如果该节点不可用,则会抛出客户端连接异常。
理想情况下,我们希望它回退到其他节点,因为集群中有其他节点可用。我们看到上面提到的逻辑是在 ReliableChannel class.
的服务方法中实现的
是否有任何特定原因不在对象构造期间实施回退而仅在服务方法(连接到其他节点的任何选项)上实施回退?
另外,我们是否可以控制节点连接的顺序?
ReliableChannel 代码片段
ReliableChannel(
Function<ClientChannelConfiguration, Result<ClientChannel>> chFactory,
ClientConfiguration clientCfg
) throws ClientException {
if (chFactory == null)
throw new NullPointerException("chFactory");
if (clientCfg == null)
throw new NullPointerException("clientCfg");
this.chFactory = chFactory;
this.clientCfg = clientCfg;
List<InetSocketAddress> addrs = parseAddresses(clientCfg.getAddresses());
primary = addrs.get(new Random().nextInt(addrs.size())); // we already verified there is at least one address
ch = chFactory.apply(new ClientChannelConfiguration(clientCfg).setAddress(primary)).get();
for (InetSocketAddress a : addrs)
if (a != primary)
this.backups.add(a);
}
public <T> T service(
ClientOperation op,
Consumer<BinaryOutputStream> payloadWriter,
Function<BinaryInputStream, T> payloadReader
) throws ClientException {
ClientConnectionException failure = null;
T res = null;
int totalSrvs = 1 + backups.size();
svcLock.lock();
try {
for (int i = 0; i < totalSrvs; i++) {
try {
if (failure != null)
changeServer();
if (ch == null)
ch = chFactory.apply(new ClientChannelConfiguration(clientCfg).setAddress(primary)).get();
long id = ch.send(op, payloadWriter);
res = ch.receive(op, id, payloadReader);
failure = null;
break;
}
catch (ClientConnectionException e) {
if (failure == null)
failure = e;
else
failure.addSuppressed(e);
}
}
}
finally {
svcLock.unlock();
}
if (failure != null)
throw failure;
return res;
}
这个问题将在 Apache Ignite 2.8 中修复:IGNITE-11599
也许它已经在支持此类修复的 GridGain 中修复了。
我们有一个包含 3 个节点的 ignite 集群,所有服务都使用 java 瘦客户端连接到集群。
当其中一个服务器节点出现故障并且服务正在尝试连接时,很少有人连接成功,很少有人失败并出现 ignite cluster unavailable 错误。于是调试源码发现,在构建ReliableChannel对象时,会随机选择一个节点进行连接,如果该节点不可用,则会抛出客户端连接异常。
理想情况下,我们希望它回退到其他节点,因为集群中有其他节点可用。我们看到上面提到的逻辑是在 ReliableChannel class.
的服务方法中实现的是否有任何特定原因不在对象构造期间实施回退而仅在服务方法(连接到其他节点的任何选项)上实施回退?
另外,我们是否可以控制节点连接的顺序?
ReliableChannel 代码片段
ReliableChannel(
Function<ClientChannelConfiguration, Result<ClientChannel>> chFactory,
ClientConfiguration clientCfg
) throws ClientException {
if (chFactory == null)
throw new NullPointerException("chFactory");
if (clientCfg == null)
throw new NullPointerException("clientCfg");
this.chFactory = chFactory;
this.clientCfg = clientCfg;
List<InetSocketAddress> addrs = parseAddresses(clientCfg.getAddresses());
primary = addrs.get(new Random().nextInt(addrs.size())); // we already verified there is at least one address
ch = chFactory.apply(new ClientChannelConfiguration(clientCfg).setAddress(primary)).get();
for (InetSocketAddress a : addrs)
if (a != primary)
this.backups.add(a);
}
public <T> T service(
ClientOperation op,
Consumer<BinaryOutputStream> payloadWriter,
Function<BinaryInputStream, T> payloadReader
) throws ClientException {
ClientConnectionException failure = null;
T res = null;
int totalSrvs = 1 + backups.size();
svcLock.lock();
try {
for (int i = 0; i < totalSrvs; i++) {
try {
if (failure != null)
changeServer();
if (ch == null)
ch = chFactory.apply(new ClientChannelConfiguration(clientCfg).setAddress(primary)).get();
long id = ch.send(op, payloadWriter);
res = ch.receive(op, id, payloadReader);
failure = null;
break;
}
catch (ClientConnectionException e) {
if (failure == null)
failure = e;
else
failure.addSuppressed(e);
}
}
}
finally {
svcLock.unlock();
}
if (failure != null)
throw failure;
return res;
}
这个问题将在 Apache Ignite 2.8 中修复:IGNITE-11599
也许它已经在支持此类修复的 GridGain 中修复了。