Cassandra 分页:从给定/随机位置开始?
Cassandra pagination: start from a given / random position?
是否可以从指定或随机位置开始分页?
我为什么需要这个?
在我的生产节点上,我有几个并行服务作业迭代大约 200 000 000 个项目并为它们更新信息。新版本的软件通常会被推送到服务器,每次推送都会重新启动服务作业。所以所有的工作都是从头开始,一次又一次。当然我使用锁,但如果我可以指示那些并行作业从不同的页面开始,那就更好了。
分页是通过 Apache Cassandra 和客户端驱动程序通过通信 pagingState
完成的,如 native protocol specification 的第 8 节所述:
However, if some results are not
part of the first response, the Has_more_pages flag will be set and the result
will contain a paging_state value. In that case, the paging_state value
should be used in a QUERY or EXECUTE message (that has the same query as
the original one or the behavior is undefined) to retrieve the next page of
results.
当您查询数据时,可以访问和存储此分页状态供以后使用,就像您在从之前的位置开始作业时所描述的那样。
这可以使用 DataStax java-driver 完成,如 'Paging' 页面手册中 'Saving and Reusing the paging state' 部分所述:
The driver exposes a PagingState object that represents where we were in the result set when the last page was fetched:
ResultSet resultSet = session.execute("your query");
// iterate the result set...
PagingState pagingState = resultSet.getExecutionInfo().getPagingState();
This object can be serialized to a String or a byte array:
String string = pagingState.toString();
byte[] bytes = pagingState.toBytes();
This serialized form can be saved in some form of persistent storage to be reused later. In our web service example, we would probably save the string version as a query parameter in the URL to the next page (http://myservice.com/results?page=<...>). When that value is retrieved later, we can deserialize it and reinject it in a statement:
PagingState pagingState = PagingState.fromString(string);
Statement st = new SimpleStatement("your query");
st.setPagingState(pagingState);
ResultSet rs = session.execute(st);
其他驱动应该也有类似的分页机制。
是否可以从指定或随机位置开始分页?
我为什么需要这个?
在我的生产节点上,我有几个并行服务作业迭代大约 200 000 000 个项目并为它们更新信息。新版本的软件通常会被推送到服务器,每次推送都会重新启动服务作业。所以所有的工作都是从头开始,一次又一次。当然我使用锁,但如果我可以指示那些并行作业从不同的页面开始,那就更好了。
分页是通过 Apache Cassandra 和客户端驱动程序通过通信 pagingState
完成的,如 native protocol specification 的第 8 节所述:
However, if some results are not part of the first response, the Has_more_pages flag will be set and the result will contain a paging_state value. In that case, the paging_state value should be used in a QUERY or EXECUTE message (that has the same query as the original one or the behavior is undefined) to retrieve the next page of results.
当您查询数据时,可以访问和存储此分页状态供以后使用,就像您在从之前的位置开始作业时所描述的那样。
这可以使用 DataStax java-driver 完成,如 'Paging' 页面手册中 'Saving and Reusing the paging state' 部分所述:
The driver exposes a PagingState object that represents where we were in the result set when the last page was fetched:
ResultSet resultSet = session.execute("your query");
// iterate the result set...
PagingState pagingState = resultSet.getExecutionInfo().getPagingState();
This object can be serialized to a String or a byte array:
String string = pagingState.toString();
byte[] bytes = pagingState.toBytes();
This serialized form can be saved in some form of persistent storage to be reused later. In our web service example, we would probably save the string version as a query parameter in the URL to the next page (http://myservice.com/results?page=<...>). When that value is retrieved later, we can deserialize it and reinject it in a statement:
PagingState pagingState = PagingState.fromString(string);
Statement st = new SimpleStatement("your query");
st.setPagingState(pagingState);
ResultSet rs = session.execute(st);
其他驱动应该也有类似的分页机制。