使用附加条件过滤分区和集群键

Filter on the partition and the clustering key with an additional criteria

我想在 table 上进行筛选,该 table 在常规列上具有分区和聚类键以及另一个条件。我收到以下警告。

InvalidQueryException: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

如果不使用分区和集群键,我理解问题。就我而言,这是相关错误还是我可以忽略它?

这里是 table 和查询的例子。

CREATE TABLE mytable(
    name text,
    id uuid,
    deleted boolean
    PRIMARY KEY((name),id)
)

SELECT id FROM mytable WHERE name='myname' AND id='myid' AND deleted=false;

在 Cassandra 中,您不能使用非主键列过滤数据,除非您在其中创建索引。

Cassandra 3.0 或更高版本允许使用非主键过滤数据但在 unpredictable 性能

Cassandra 3.0 或更高版本,如果您提供所有主键(作为您给定的查询),那么您可以使用带有 ALLOW FILTERING 的查询,忽略警告

否则从客户端过滤或删除字段 deleted 并创建另一个 table :

而不是将字段更新为 deleted true 将数据移动到另一个 table 让我们说 mytable_deleted

CREATE TABLE mytable_deleted (
    name text,
    id uuid
    PRIMARY KEY (name, id)
);

现在,如果您只有 mytable 上的未删除数据和 mytable_deleted 上的已删除数据 table

为其创建索引:

deleted 是低基数列。所以记住

A query on an indexed column in a large cluster typically requires collating responses from multiple data partitions. The query response slows down as more machines are added to the cluster. You can avoid a performance hit when looking for a row in a large partition by narrowing the search.

阅读更多:When not to use an index