为什么键前缀优化不适用于聚类列上的二级索引？

Question

Scylla DB 为二级索引实现了所谓的“key prefix optimization”，如果指定了主键的一部分，它会消除过滤。例如。可以在 table A.

上执行 SELECT * FROM A WHERE a = 'a' AND b = 'a' AND d = 'a';

CREATE TABLE A (
    a text,
    b text,
    c text,
    d text,
    PRIMARY KEY(a,b,c)
);
CREATE INDEX A_index ON A (d);

但如果 A.d 是聚类列则不起作用。例如。如下面的 table B。

CREATE TABLE B (
    a text,
    b text,
    c text,
    d text,
    PRIMARY KEY(a,b,c,d)
);
CREATE INDEX B_index ON B (d);

以上 SELECT 查询失败并出现错误：

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

ScyllaDB 3.0.1.

Answer 1

感谢您找到一个有趣的极端案例 :)

问题是第二个查询限制了聚簇列(b, d)，它本身并没有形成聚簇键前缀。当然，d 是索引的，所以应该发生的是在键前缀优化中使用 a 和 d 作为索引列。

相反，它错误地决定 (b, d) 没有形成前缀，因此它被从优化候选中丢弃，没有考虑 d 有一个索引。

这个简化将得到修复，我在这里创建了一个错误跟踪器问题：https://github.com/scylladb/scylla/issues/4178

为什么键前缀优化不适用于聚类列上的二级索引？

Why the key prefix optimization doesn't work with secondary index on a clustering column?

scylla