Cassandra - 如何按最新时间戳分组
Cassandra - How group by latest timestamp
我在这里看到了一些相关主题,但我仍然不清楚,如何使用 cassandra 4.0.1 按最新行值分组
假设我的 table 看起来像;
CREATE TABLE simple_search (
engine text,
term text,
time bigint,
rank bigint,
url text,
domain text,
pagenum bigint,
descr text,
display_url text,
title text,
type text,
PRIMARY KEY ((domain), term , time , engine, url , pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC, engine DESC , url DESC);
我的数据如下:
SELECT time, rank, term from search_by_domain_termsV2 where domain ='zerotoappstore.com'
time , rank, term
1633297772, 105, avfoundation swift
1633315263, 112, best ide
1633332881, 119, best ide
1633365856, 50, developing an app cost
1633375273, 36, developing an app cost
我想要分组后
time , rank, term
1633297772, 105, avfoundation swift
1633332881, 119, best ide
1633375273, 36, developing an app cost
如果我这样做
SELECT max(time) , rank, term from search_by_domain_termsV2 where domain ='zerotoappstore.com' GROUP BY term;
它给了我正确的最大时间值但不是正确的评级和期限。
1633297772 105 avfoundation swift
1633332881 112 best ide
1633375273 50 developing an app cost
是否可以按term分组,取时间的最大值?
@VitalyT,
首先,如果我们没有将 pagenum
指定为 create table 构造的 clustering order by 子句的一部分,则会出现如下错误:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"
所以,它必须像下面这样:
CREATE TABLE IF NOT EXISTS simple_search(
...
PRIMARY KEY (domain, term, time, engine, url, pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC, engine DESC, url [ASC|DESC]);
接下来,给出5行的数据样本。请注意,我假设了 engine
、url
、pagenum
列的某些值,因为原始问题中未提供这些值:
SELECT * FROM simple_search ;
domain | term | time | engine | url | pagenum | descr | display_url | rank | title | type
--------------------+------------------------+------------+---------+------+---------+-------+-------------+------+-------+------
zerotoappstore.com | developing an app cost | 1633375273 | engine5 | url5 | 5 | null | null | 36 | null | null
zerotoappstore.com | developing an app cost | 1633365856 | engine4 | url4 | 4 | null | null | 50 | null | null
zerotoappstore.com | best ide | 1633332881 | engine3 | url3 | 3 | null | null | 119 | null | null
zerotoappstore.com | best ide | 1633315263 | engine2 | url2 | 2 | null | null | 112 | null | null
zerotoappstore.com | avfoundation swift | 1633297772 | engine1 | url1 | 1 | null | null | 105 | null | null
(5 rows)
如果我们只检索 MAX(time)
列(没有任何 GROUP BY
),我们将得到以下结果:
SELECT MAX(time),rank,term FROM simple_search WHERE domain = 'zerotoappstore.com';
system.max(time) | rank | term
------------------+------+------------------------
1633375273 | 36 | developing an app cost
(1 rows)
现在,让我们看看如果我们将 GROUP BY term
子句包含在完全相同的 SELECT
语句中会发生什么:
SELECT MAX(time), rank, term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;
system.max(time) | rank | term
------------------+------+------------------------
1633375273 | 36 | developing an app cost
1633332881 | 119 | best ide
1633297772 | 105 | avfoundation swift
(3 rows)
如果我们删除 time
列上的 MAX
聚合函数怎么办,因为我们已经按降序存储了 time
列的数据?我们得到以下信息:
SELECT time,rank,term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;
time | rank | term
------------+------+------------------------
1633375273 | 36 | developing an app cost
1633332881 | 119 | best ide
1633297772 | 105 | avfoundation swift
(3 rows)
这是你想要的结果吗?另请参阅 the corresponding documentation 以了解特定条件。
我在这里看到了一些相关主题,但我仍然不清楚,如何使用 cassandra 4.0.1 按最新行值分组
假设我的 table 看起来像;
CREATE TABLE simple_search (
engine text,
term text,
time bigint,
rank bigint,
url text,
domain text,
pagenum bigint,
descr text,
display_url text,
title text,
type text,
PRIMARY KEY ((domain), term , time , engine, url , pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC, engine DESC , url DESC);
我的数据如下:
SELECT time, rank, term from search_by_domain_termsV2 where domain ='zerotoappstore.com'
time , rank, term
1633297772, 105, avfoundation swift
1633315263, 112, best ide
1633332881, 119, best ide
1633365856, 50, developing an app cost
1633375273, 36, developing an app cost
我想要分组后
time , rank, term
1633297772, 105, avfoundation swift
1633332881, 119, best ide
1633375273, 36, developing an app cost
如果我这样做
SELECT max(time) , rank, term from search_by_domain_termsV2 where domain ='zerotoappstore.com' GROUP BY term;
它给了我正确的最大时间值但不是正确的评级和期限。
1633297772 105 avfoundation swift
1633332881 112 best ide
1633375273 50 developing an app cost
是否可以按term分组,取时间的最大值?
@VitalyT,
首先,如果我们没有将 pagenum
指定为 create table 构造的 clustering order by 子句的一部分,则会出现如下错误:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"
所以,它必须像下面这样:
CREATE TABLE IF NOT EXISTS simple_search(
...
PRIMARY KEY (domain, term, time, engine, url, pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC, engine DESC, url [ASC|DESC]);
接下来,给出5行的数据样本。请注意,我假设了 engine
、url
、pagenum
列的某些值,因为原始问题中未提供这些值:
SELECT * FROM simple_search ;
domain | term | time | engine | url | pagenum | descr | display_url | rank | title | type
--------------------+------------------------+------------+---------+------+---------+-------+-------------+------+-------+------
zerotoappstore.com | developing an app cost | 1633375273 | engine5 | url5 | 5 | null | null | 36 | null | null
zerotoappstore.com | developing an app cost | 1633365856 | engine4 | url4 | 4 | null | null | 50 | null | null
zerotoappstore.com | best ide | 1633332881 | engine3 | url3 | 3 | null | null | 119 | null | null
zerotoappstore.com | best ide | 1633315263 | engine2 | url2 | 2 | null | null | 112 | null | null
zerotoappstore.com | avfoundation swift | 1633297772 | engine1 | url1 | 1 | null | null | 105 | null | null
(5 rows)
如果我们只检索 MAX(time)
列(没有任何 GROUP BY
),我们将得到以下结果:
SELECT MAX(time),rank,term FROM simple_search WHERE domain = 'zerotoappstore.com';
system.max(time) | rank | term
------------------+------+------------------------
1633375273 | 36 | developing an app cost
(1 rows)
现在,让我们看看如果我们将 GROUP BY term
子句包含在完全相同的 SELECT
语句中会发生什么:
SELECT MAX(time), rank, term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;
system.max(time) | rank | term
------------------+------+------------------------
1633375273 | 36 | developing an app cost
1633332881 | 119 | best ide
1633297772 | 105 | avfoundation swift
(3 rows)
如果我们删除 time
列上的 MAX
聚合函数怎么办,因为我们已经按降序存储了 time
列的数据?我们得到以下信息:
SELECT time,rank,term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;
time | rank | term
------------+------+------------------------
1633375273 | 36 | developing an app cost
1633332881 | 119 | best ide
1633297772 | 105 | avfoundation swift
(3 rows)
这是你想要的结果吗?另请参阅 the corresponding documentation 以了解特定条件。