Cassandra 分页:从给定/随机位置开始?

Cassandra pagination: start from a given / random position?

是否可以从指定或随机位置开始分页?

我为什么需要这个?

在我的生产节点上,我有几个并行服务作业迭代大约 200 000 000 个项目并为它们更新信息。新版本的软件通常会被推送到服务器,每次推送都会重新启动服务作业。所以所有的工作都是从头开始,一次又一次。当然我使用锁,但如果我可以指示那些并行作业从不同的页面开始,那就更好了。

分页是通过 Apache Cassandra 和客户端驱动程序通过通信 pagingState 完成的,如 native protocol specification 的第 8 节所述:

However, if some results are not part of the first response, the Has_more_pages flag will be set and the result will contain a paging_state value. In that case, the paging_state value should be used in a QUERY or EXECUTE message (that has the same query as the original one or the behavior is undefined) to retrieve the next page of results.

当您查询数据时,可以访问和存储此分页状态供以后使用,就像您在从之前的位置开始作业时所描述的那样。

这可以使用 DataStax java-driver 完成,如 'Paging' 页面手册中 'Saving and Reusing the paging state' 部分所述:

The driver exposes a PagingState object that represents where we were in the result set when the last page was fetched:

ResultSet resultSet = session.execute("your query");
// iterate the result set...
PagingState pagingState = resultSet.getExecutionInfo().getPagingState();

This object can be serialized to a String or a byte array:

String string = pagingState.toString();
byte[] bytes = pagingState.toBytes();

This serialized form can be saved in some form of persistent storage to be reused later. In our web service example, we would probably save the string version as a query parameter in the URL to the next page (http://myservice.com/results?page=<...>). When that value is retrieved later, we can deserialize it and reinject it in a statement:

PagingState pagingState = PagingState.fromString(string);
Statement st = new SimpleStatement("your query");
st.setPagingState(pagingState);
ResultSet rs = session.execute(st);

其他驱动应该也有类似的分页机制。