Cassandra 和 <=, => 日期运算符没有允许过滤

Question

我是 cassandra 的新手，我不明白为什么我不能按日期过滤（想要 return 日期之间的结果）例如：

CREATE TABLE test.service_bar(
    service_bar_id UUID,
    start_date_time timestamp,
    end_date_time timestamp,
    title varchar,
    message text,
    is_active boolean,
    PRIMARY KEY((start_date_time, end_date_time))
);

然后这项工作：

  SELECT start_date_time, end_date_time, is_active, message, service_bar_id, title
  FROM test.service_bar
  WHERE start_date_time = '2019-10-30 14:10:29'  AND end_date_time = '2019-10-30 14:10:29'
  LIMIT 500;

但是这个剂量

  SELECT start_date_time, end_date_time, is_active, message, service_bar_id, title
  FROM test.service_bar
  WHERE start_date_time >= '2019-10-30 14:10:29'  AND end_date_time <= '2019-10-30 14:10:29'
  LIMIT 500;

我不会用ALLOW FILTERING

如何在 Cassandra 中执行此类查询？

Answer 1

I dont understand why I cant filtering by dates (wanna return result between date)

您看到的行为是因为：

PRIMARY KEY((start_date_time, end_date_time))

您已将 start_date_time 和 end_date_time 定义为复合分区键。由于 Cassandra 使用分布式哈希来确保正确的数据分布，因此分区不会按其值的顺序存储。它们由分区键的 散列令牌值 存储。您可以通过在分区键上使用 token 函数来查看：

aaron@cqlsh:Whosebug> SELECT token(start_date_time,end_date_time),start_date_time,end_date_time,service_bar_id FROM service_bar ;

 system.token(
     start_date_time,
     end_date_time)   | start_date_time                 | end_date_time                   | service_bar_id
----------------------+---------------------------------+---------------------------------+--------------------------------------
    26346508703811310 | 2019-10-30 19:10:29.000000+0000 | 2019-10-30 19:10:29.000000+0000 | 49a70440-8689-4248-b389-13b8d0373e58
  1488616260313758762 | 2019-11-01 19:10:29.000000+0000 | 2019-11-01 19:10:29.000000+0000 | b0bab610-a285-41e7-ba5c-d56f8fb12f52
  2185622653117187064 | 2019-10-30 21:10:29.000000+0000 | 2019-10-30 21:10:29.000000+0000 | 3686c6a6-fd8d-4247-b501-964363a48f63
  7727638696734890177 | 2019-10-30 20:10:29.000000+0000 | 2019-10-30 20:10:29.000000+0000 | 97fc799e-fb54-4b7f-956e-f06bcb9e9d9d

(4 rows)

这是您的行的默认顺序。这样做是因为每个节点负责特定的令牌范围，以确保数据在多节点集群中尽可能均匀地分布（这是通常的生产用例）。因此，CQL 对查询分区键的方式有一些限制。这些限制是为了避免您编写错误的查询...例如不允许对分区键进行范围查询。

how I can do such queries in Cassandra?

这还应该告诉您，您应该构建表和查询以确保它们可以通过对单个节点的请求来实现。鉴于此，您的用例实际上只有在您更改分区键时才有效。

开发团队实现像您这样的解决方案的一种方法是使用一种称为 "time bucketing" 的建模技术，有时甚至只是 "bucketing." 在这种情况下，假设您永远不会编写超过每月几千个条目。也许情况并非如此，但我将在本示例中使用它。然后我可以按月分区，然后使用 _time 列作为聚类键。

CREATE TABLE Whosebug.service_bar_by_month (
    month_bucket int,
    start_date_time timestamp,
    end_date_time timestamp,
    is_active boolean,
    message text,
    service_bar_id uuid,
    title text,
    PRIMARY KEY (month_bucket, start_date_time, end_date_time)
) WITH CLUSTERING ORDER BY (start_date_time DESC, end_date_time DESC);

这将按 month_bucket 的值将所有行存储在一起，并且在每个分区内，行将按 start_date_time 和 end_date_time 降序排列。现在这有效了：

aaron@cqlsh:Whosebug> SELECT start_date_time, end_date_time, is_active, message, service_bar_id, title
                 ... FROM service_bar_by_month
                 ... WHERE month_bucket = 201910 AND start_date_time >= '2019-10-30 14:10:29'  AND start_date_time <= '2019-10-31 23:59:59';

 start_date_time                 | end_date_time                   | is_active | message           | service_bar_id                       | title
---------------------------------+---------------------------------+-----------+-------------------+--------------------------------------+--------
 2019-10-30 21:10:29.000000+0000 | 2019-10-30 21:10:29.000000+0000 |      True | This is an alert3 | eae5d3be-b2b2-40a1-aa28-0412fe9c18e6 | alert3
 2019-10-30 20:10:29.000000+0000 | 2019-10-30 20:10:29.000000+0000 |      True | This is an alert2 | af4ec72f-7758-42ef-b731-8d08f8a00006 | alert2
 2019-10-30 19:10:29.000000+0000 | 2019-10-30 19:10:29.000000+0000 |      True | This is an alert1 | 8b13db5c-9e39-4ee5-90a9-64758c5ab5be | alert1

(3 rows)

但请注意，您只能对单个集群键执行范围查询，如上面的 start_date_time。这行不通：

AND start_date_time >= '2019-10-30 14:10:29'  AND end_date_time <= '2019-10-31 23:59:59';

它无法工作，因为 Cassandra 被设计为顺序读取和写入数据 from/to 磁盘。允许在单个查询中对多个列进行范围查询将需要 Cassandra 进行 random 读取，这是它不擅长的。您可以 make 它通过使用 ALLOW FILTERING 指令来做到这一点，但不推荐这样做。虽然，在小分区内使用 ALLOW FILTERING 可能会执行正常。

Cassandra 和 <=, => 日期运算符没有允许过滤

Cassandra and <=, => operators on dates without ALLOW FILTERING

cql

cassandra