Cassandra 时间序列建模

Question

我有一个table这样的。

> CREATE TABLE docyard.documents (
>     document_id text,
>     namespace text,
>     version_id text,
>     created_at timestamp,
>     path text,
>     attributes map<text, text>
>     PRIMARY KEY (document_id, namespace, version_id, created_at) ) WITH CLUSTERING ORDER BY (namespace ASC, version_id ASC, created_at
> ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99.0PERCENTILE';

我希望能够在以下条件下进行范围查询-

select * from documents where namespace = 'something' and created_at> 'some-value' order by created_at allow filtering;

select from documents where namespace = 'something' and path = 'something' and created_at> 'some-value' order by created_at allow filtering;

我无法以任何方式使这些查询起作用。也尝试了二级索引。有人可以帮忙吗？

我在尝试让它工作时不断得到一些或其他。

Answer 1

我认为您需要回顾一下 Cassandra 中数据建模的工作原理。

第一个查询可能如下所示：

select * from documents where namespace = 'something' and created_at > 'some_formatted_date'  and document_id='someid' and version_id='some_version' order by namespace, version_id, created_at allow filtering;

查询 Cassandra table 时，您必须：

在select
Order by 遵循聚类顺序

修复第二个查询很简单。你想做什么？ Cassandra 针对写入性能进行了优化。对于计划运行的每组查询，您可能希望将此数据写入多个 table。

Answer 2

首先，不要使用二级索引或者ALLOW FILTERING。随着时间的推移，时间序列数据会表现得非常糟糕。

为了满足您的第一个查询，您需要像这样重构您的 PRIMARY KEY 和 CLUSTERING ORDER：

PRIMARY KEY (namespace, created_at, document_id) ) 
WITH CLUSTERING ORDER BY (created_at DESC, document_id ASC);

这将允许以下情况：

分区 namespace。
按 created_at 降序排列（最先读取最近的行）。
独特性 document_id
您的查询中不需要 ALLOW FILTERING 或 ORDER BY，因为将提供必要的键，并且结果已经按照您的 CLUSTERING ORDER 排序。

对于您的第二个查询，我会创建一个额外的查询 table。这是因为在 Cassandra 中，您需要为 table 建模以适合您的查询。您最终可能会对同一数据进行多次查询 table，这没关系。

CREATE TABLE docyardbypath.documents (
  document_id text,
  namespace text,
  version_id text,
  created_at timestamp,
  path text,
  attributes map<text, text>
PRIMARY KEY ((namespace, path), created_at, document_id) ) 
  WITH CLUSTERING ORDER BY (created_at DESC, document_id ASC);

这将：

同时按 namespace 和 path 划分。
允许 namespace 和 path 的唯一组合中的行根据您的 CLUSTERING ORDER 进行排序。
同样，您的查询中不需要 ALLOW FILTERING 或 ORDER BY。

Cassandra 时间序列建模

Cassandra time series modeling

cassandra

nosql

cql3