当通过 SAI 字段在 Cassandra (Spark) 中执行 SELECT 时。行返回的顺序是什么？

Question

如果我在 Astra 中有一个像这样创建的 table：

CREATE TABLE rayven.mytable (
a text,
b text,
c timestamp,
PRIMARY KEY (a, c)
) WITH CLUSTERING ORDER BY (c DESC)

然后我添加了 SAI 索引：

CREATE CUSTOM INDEX b_index ON mytable (b) USING 'StorageAttachedIndex';

当我使用 ORDER BY 查询时：

select * from mytable where b='x' order by c desc;

明白了

InvalidRequest: Error from server: code=2200 [Invalid query] message="ORDER BY with 2ndary indexes is not supported."

因为原来的table是按“c”降序排列的。我可以假设上述 SELECT 的结果将按此顺序还是在使用 SAI 索引进行选择时无法知道或控制顺序？

Answer 1

为了帮助说明这一点，我创建了您的 table 并插入了一些数据。然后，我查询了 table 的值 b，并在本示例的分区键上包含了 token 函数。

注意：在 Astra 中不是运行，而是在我本地的 4.0 rc1 实例上。但是，原则保持不变。

基本上，所有结果集都按分区键的散列标记值排序，然后 CLUSTERING ORDER 在每个分区中优先：

> SELECT a, token(a), c FROM mytable WHERE b='b'; a | system.token(a) | c ----+----------------------+--------------------------------- a4 | -9170418876698302957 | 2021-05-03 14:38:42.708000+0000 a5 | -925545907721365710 | 2021-05-03 14:39:06.849000+0000 a3 | -96725737913093993 | 2021-05-03 14:40:30.942000+0000 a3 | -96725737913093993 | 2021-05-03 14:39:18.340000+0000 a2 | 5060052373555595560 | 2021-05-03 14:40:30.938000+0000 a2 | 5060052373555595560 | 2021-05-03 14:39:14.914000+0000 a1 | 5693669818594506317 | 2021-05-03 14:38:54.426000+0000 a1 | 5693669818594506317 | 2021-05-03 14:38:52.758000+0000 (8 rows)

如您所见，结果集未完全 c 排序。但最初按 a 的散列标记值排序，然后然后按 c 排序每个分区（a).

所以“不”，您不能指望数据会被 c 自动完全排序。

当通过 SAI 字段在 Cassandra (Spark) 中执行 SELECT 时。行返回的顺序是什么？

When doing a SELECT in Cassandra (Spark) by an SAI field. In what order are the rows returned?

cassandra

datastax

datastax-astra