table cassandra 的范围查询定义语句？

Question

这是table数据

video_id uuid
user_id timeuuid
added_year int
added_date timestamp
title text
description text

我想根据以下查询构造table

select * from video_by_year where added_year<2013;

创建 table videos_by_year (

video_id uuid
user_id timeuuid
added_year int
added_date timestamp
title text
description text
PRIMARY KEY ((added_year) added_year)

);

注意：我使用 added_year 作为主键和集群键，我想这是不正确的。

Answer 1

因此，cassandra 中数据建模的一个问题是第一个组件 - 分区键 - 必须使用“=”。如果您意识到 cassandra 在做什么，那么这样做的原因就很清楚了——它使用该值，对其进行哈希处理（md5 或 murmur3），并使用它来确定集群中的哪些服务器拥有该分区。

因此，您不能使用不等式 - 它需要扫描集群中的每一行。

如果您需要在 2013 年之前添加视频，请考虑使用日期的某些部分作为分区键的系统，然后 SELECT 从每个日期 'buckets'，您可以异步和并行进行。例如：

create table videos_by_year (
 video_id uuid
 user_id timeuuid
 added_date_bucket text
 added_date timestamp
 title text
 description text
 PRIMARY KEY ((added_date_bucket), added_date, video_id)
) ;

我为 added_date_bucket 使用了文本，因此您可以使用 'YYYY' 或 'YYYY-MM' 或类似的。请注意，根据您向系统添加视频的速度，您甚至可能需要 'YYYY-MM-DD' 或 'YYYY-MM-DD-HH:ii:ss'，因为您将达到每个存储桶几百万个视频的实际限制。

你可以变得聪明，让 video_id 成为一个 timeuuid，然后你在一个列中得到 added_date 和 video_id。

table cassandra 的范围查询定义语句？

table definition statement for cassandra for range queries?

data-modeling

cassandra

cassandra-2.0