Cassandra CQL 替代 WHERE 子句中的 OR
Cassandra CQL alternative to OR in WHERE clause
这是我用来创建 table 的代码:
CREATE TABLE test.packages (
packageuuid timeuuid,
ruserid text,
suserid text,
timestamp int,
PRIMARY KEY (ruserid, suserid, packageuuid, timestamp)
);
然后我创建一个物化视图:
CREATE MATERIALIZED VIEW test.packages_by_userid
AS SELECT * FROM test.packages
WHERE ruserid IS NOT NULL
AND suserid IS NOT NULL
AND TIMESTAMP IS NOT NULL
AND packageuuid IS NOT NULL
PRIMARY KEY (ruserid, suserid, timestamp, packageuuid)
WITH CLUSTERING ORDER BY (packageuuid DESC);
我希望能够搜索两个ID之间发送的包裹
所以我需要这样的东西:
SELECT * FROM test.packages_by_userid WHERE (ruserid = '1' AND suserid = '2' AND suserid = '1' AND ruserid = '2') AND timestamp > 1496601553;
我如何使用 CQL 完成这样的事情?
我搜索了一下,但我想不通。
我愿意更改 table 的结构,如果它能让这样的事情成为可能的话。
如果没有物化视图也可行,那也很好。
在子句中使用:
SELECT * FROM test.packages_by_userid WHERE ruserid IN ( '1', '2') AND suserid IN ( '1','2') AND timestamp > 1496601553;
注意:保持较小的 in 子句大小,分区中较大的 in 子句会导致 GC 暂停和堆压力,从而导致整体性能下降
In practical terms this means you’re waiting on this single coordinator node to give you a response, it’s keeping all those queries and their responses in the heap, and if one of those queries fails, or the coordinator fails, you have to retry the whole thing.
如果子句较大的多个分区尝试使用单独的查询,对于每个分区(ruserid
)用executeAsync。
SELECT * FROM test.packages_by_userid WHERE ruserid = '1' AND suserid IN ( '1','2') AND timestamp > 1496601553;
SELECT * FROM test.packages_by_userid WHERE ruserid = '2' AND suserid IN ( '1','2') AND timestamp > 1496601553;
由于您总是同时搜索发件人和收件人,因此我将使用以下 table 布局对其进行建模:
CREATE TABLE test.packages (
ruserid text,
suserid text,
timestamp int,
packageuuid timeuuid,
PRIMARY KEY ((ruserid, suserid), timestamp)
);
这样,对于每对 sender/receiver 你需要 运行 两个查询,每个分区一个:
SELECT * FROM packages WHERE ruserid=1 AND suserid=2 AND timestamp > 1496601553;
SELECT * FROM packages WHERE ruserid=2 AND suserid=1 AND timestamp > 1496601553;
恕我直言,这是最好的解决方案,因为请记住,在 Cassandra 中,您从查询开始,然后在此基础上构建 table 模型,绝不会反过来。
这是我用来创建 table 的代码:
CREATE TABLE test.packages (
packageuuid timeuuid,
ruserid text,
suserid text,
timestamp int,
PRIMARY KEY (ruserid, suserid, packageuuid, timestamp)
);
然后我创建一个物化视图:
CREATE MATERIALIZED VIEW test.packages_by_userid
AS SELECT * FROM test.packages
WHERE ruserid IS NOT NULL
AND suserid IS NOT NULL
AND TIMESTAMP IS NOT NULL
AND packageuuid IS NOT NULL
PRIMARY KEY (ruserid, suserid, timestamp, packageuuid)
WITH CLUSTERING ORDER BY (packageuuid DESC);
我希望能够搜索两个ID之间发送的包裹
所以我需要这样的东西:
SELECT * FROM test.packages_by_userid WHERE (ruserid = '1' AND suserid = '2' AND suserid = '1' AND ruserid = '2') AND timestamp > 1496601553;
我如何使用 CQL 完成这样的事情?
我搜索了一下,但我想不通。
我愿意更改 table 的结构,如果它能让这样的事情成为可能的话。
如果没有物化视图也可行,那也很好。
在子句中使用:
SELECT * FROM test.packages_by_userid WHERE ruserid IN ( '1', '2') AND suserid IN ( '1','2') AND timestamp > 1496601553;
注意:保持较小的 in 子句大小,分区中较大的 in 子句会导致 GC 暂停和堆压力,从而导致整体性能下降
In practical terms this means you’re waiting on this single coordinator node to give you a response, it’s keeping all those queries and their responses in the heap, and if one of those queries fails, or the coordinator fails, you have to retry the whole thing.
如果子句较大的多个分区尝试使用单独的查询,对于每个分区(ruserid
)用executeAsync。
SELECT * FROM test.packages_by_userid WHERE ruserid = '1' AND suserid IN ( '1','2') AND timestamp > 1496601553;
SELECT * FROM test.packages_by_userid WHERE ruserid = '2' AND suserid IN ( '1','2') AND timestamp > 1496601553;
由于您总是同时搜索发件人和收件人,因此我将使用以下 table 布局对其进行建模:
CREATE TABLE test.packages (
ruserid text,
suserid text,
timestamp int,
packageuuid timeuuid,
PRIMARY KEY ((ruserid, suserid), timestamp)
);
这样,对于每对 sender/receiver 你需要 运行 两个查询,每个分区一个:
SELECT * FROM packages WHERE ruserid=1 AND suserid=2 AND timestamp > 1496601553;
SELECT * FROM packages WHERE ruserid=2 AND suserid=1 AND timestamp > 1496601553;
恕我直言,这是最好的解决方案,因为请记住,在 Cassandra 中,您从查询开始,然后在此基础上构建 table 模型,绝不会反过来。