Cassandra CQLEngine 允许过滤

Question

我正在使用 Python Cassandra Cqlenengine 扩展。我创建了多对多 table 但我在 user_applications 模型查询过滤过程中收到错误。我阅读了有关此问题的不同资源，但我并未完全理解此问题。

来源： https://ohioedge.com/2017/07/05/cassandra-primary-key-partitioning-key-clustering-key-a-simple-explanation/

Cassandra Allow filtering

数据库模型：

class UserApplications(BaseModel):
    __table_name__ = "user_applications"

    user_id = columns.UUID(required=True, primary_key=True, index=True)
    application_id = columns.UUID(required=True, primary_key=True, index=True)
    membership_id = columns.UUID(required=True, primary_key=True, index=True)

错误信息：

Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Python CQLEngine代码：

q = UserApplications.filter(membership_id=r.membership_id,
                                    user_id=r.user_id,
                                    application_id=r.application_id)

CQLEngine SQL 语句：

SELECT "id", "status", "created_date", "update_date" FROM db.user_applications WHERE "membership_id" = %(0)s AND "user_id" = %(1)s AND "application_id" = %(2)s LIMIT 10000

描述Table结果：

CREATE TABLE db.user_applications (
    id uuid,
    user_id uuid,
    application_id uuid,
    membership_id uuid,
    created_date timestamp,
    status int,
    update_date timestamp,
    PRIMARY KEY (id, user_id, application_id, membership_id)
) WITH CLUSTERING ORDER BY (user_id ASC, application_id ASC, membership_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
CREATE INDEX user_applications_membership_id_idx ON db.user_applications (membership_id);

等待您的帮助。

Answer 1

您收到此错误的原因是您没有在查询中添加 ALLOW FILTERING 标志，如果您在查询末尾添加 ALLOW FILTERING 它应该可以工作。

在 Cassandra 查询中使用 ALLOW FILTERING 实际上允许 cassandra 在加载它们之后过滤掉一些行（可能在它从 table 加载所有行之后）。例如，对于您的查询，Cassandra 可以执行此查询的唯一方法是从 table UserApplications 中检索所有行，然后过滤掉每列不具有请求值的行你限制了。

使用 ALLOW FILTERING 可能会产生不可预测的 table 性能结果，实际性能取决于 table 中的数据分布。如果您的 table 包含例如 100 万行，并且其中 95% 的列具有您指定的列的请求值，则查询仍然相对有效，您应该使用 ALLOW FILTERING。另一方面，如果您的 table 包含 100 万行，而只有 2 行包含请求的值，则您的查询效率极低。 Cassandra 将免费加载 999、998 行。一般来说，如果您的查询需要添加 ALLOW FILTERING，那么您可能应该重新考虑您的架构或为您经常查询的列添加二级索引。

对于您的情况，我建议将列 membership_id、user_id、application_id 作为复合分区键。如果您这样做，您将不再需要在加载后过滤掉任何行，因为所有具有三列相同值的行将驻留在同一分区（在同一物理节点中），并且您应该在查询（您已经在问题中添加的查询中这样做了）。以下是您可以这样做的方式：

CREATE TABLE db.user_applications (
    user_id uuid,
    application_id uuid,
    membership_id uuid,
    created_date timestamp,
    status int,
    update_date timestamp,
    PRIMARY KEY ((user_id, application_id, membership_id))
);

Cassandra CQLEngine 允许过滤

Cassandra CQLEngine Allow Filtering

database

cql

cassandra

cqlengine

cassandra-3.0