Cassandra

Question

我在 Cassandra 中遇到以下持续性错误：

Cassandra- 尝试查询的所有主机均失败（已尝试：xxxxxx（com.datastax.driver.core.exceptions.DriverException：尝试获取可用连接时超时（您可能希望增加每个主机连接的驱动程序数量）） ))

当我尝试将大型数据库加载到单个集群时出现问题。到目前为止，我已经尝试了我发现的关于这个问题的所有建议：

我只有一个集群和一个会话
我正在使用准备好的语句进行插入
我已经慷慨地增加了双方的超时时间

我在此处粘贴我正在使用的函数，按照此 blog post 中的建议加载，也许有人能发现一些东西。更改 BATCH_SIZE 是唯一以某种方式改善情况的因素。如果我将它设置为 1_000_000 它几乎会立即失败，如果我将它设置为 100_000 它会运行一段时间。在下面的代码中，pstatement 是一个 PreparedStatement，futures 是一个列表

.

public boolean addPair(byte[] key, byte[] value) throws IOException {
    if (futures.size() >= BATCH_SIZE) {
      flush();
    }
    BoundStatement boundStatementInsert = new BoundStatement(pstatement);
    futures
           .add(session.executeAsync(boundStatementInsert.bind(ByteBuffer.wrap(key), ByteBuffer.wrap(value)).setConsistencyLevel(ConsistencyLevel.ALL)));
    return true; }


private void flush() {
    for (ResultSetFuture rsf : futures) {
      rsf.getUninterruptibly();
    }
    futures.clear(); }

提前致谢

阿尔托伯

Answer 1

批处理是一次 update/insert 多个表的最佳方式。批次应该很小，小于 5 kb 的数据。批处理是为了原子性，而不是性能优化。请参考 https:// medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e 以获得更快的数据插入，并且它不使用 Batch Statement

Answer 2

驱动程序不会同时处理超过给定主机的最大请求数。此数字取决于您的池配置，请参阅详细信息 here（使用左上角的组合来匹配您的驱动程序版本）。

如果您尝试发送更多请求，它们将排队。消息 Timeout while trying to acquire available connection 表示排队的请求超时，换句话说，您发送的请求超出了驱动程序的处理能力。

对于驱动程序默认值，最大值应为 1024。这是非常保守的，考虑到您描述的设置，我认为您可以更高。尝试添加更多连接 and/or 提高每个连接的请求数，相应地调整 BATCH_SIZE。

Cassandra - com.datastax.driver.core.exceptions.DriverException：尝试获取可用连接时超时

Cassandra - com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection

java

key-value-store