Cassandra JDBC 堆

Cassandra JDBC heap

我对执行包含数十亿行的 select 查询时的堆大小有疑问。

我将 jdbc 与 prepared statement 一起使用,提取大小为 1000 行。

下面的代码说明了我的问题:

ResultSet rs = ...
for (Row r : rs) {
    // If the result is not fully fetched
    if (rs.getAvailableWithoutFetching() == FETCH_SIZE && !rs.isFullyFetched()) {
        LOGGER.info("Load " + FETCH_SIZE + " more rows");
        rs.fetchMoreResults(); 
    }

    ...
}

java 加载数十亿行还是 FETCH_SIZE 行加载 FETCH_SIZE 行?

假设您使用的是 Datastax 驱动程序,从 documentation for setFetchSize:

The fetch size controls how much resulting rows will be retrieved simultaneously (the goal being to avoid loading too much results in memory for queries yielding large results). Please note that while value as low as 1 can be used, it is highly discouraged to use such a low value in practice as it will yield very poor performance. If in doubt, leaving the default is probably a good idea.

Only SELECT queries only ever make use of that setting.

Note: Paging is not supported with the native protocol version 1. If you call this method with fetchSize > 0 and fetchSize != Integer.MAX_VALUE and the protocol version is in use (i.e. if you've force version 1 through Cluster.Builder.withProtocolVersion(int) or you use Cassandra 1.2), you will get UnsupportedProtocolVersionException when submitting this statement for execution

因此它不会将之前获取的结果保留在内存中,但您必须注意不要在代码中引用这些结果以使其被垃圾收集。 另请阅读 fetchMoreResults 的文档 - 它可能无法按您期望的方式工作。