基于 Cassandra table 的查询和主键唯一性

Question

我读过 here 对于 table 这样的人：

CREATE TABLE user (
    username text,
    password text,
    email text,
    company text,
    PRIMARY KEY (username)
);

我们可以创建一个 table 比如：

CREATE TABLE user_by_company (
    company text,
    username text,
    email text,
    PRIMARY KEY (company)
);

为了支持公司查询。但是第二个 table 的主键唯一性呢？

Answer 1

我认为博客中有错字（您提到的link）。 table 结构是正确的，因为 user_by_company 唯一性会有问题。

支持错字理论：

In this case, creating a secondary index in the company field in the user table could be a solution because it has much lower cardinality than the user's email but let’s solve it with performance in mind. Secondary indexes are always slower than dedicated table approach.

这是博客中提到的按公司查询用户的行。如果您要将公司定义为主键或主键的一部分，则无需创建二级索引。

Answer 2

修改 table 的 PRIMARY KEY 定义并添加 username 作为集群键：

CREATE TABLE user_by_company (
    company text,
    username text,
    email text,
    PRIMARY KEY (company,username)
);

这将强制执行唯一性，以及 return 特定公司的所有用户名。此外，您的结果集将按 username.

升序排序

data will be partitioned by the company name over nodes. What if there is a lot of users from one company and less from other one. Data will be partition'ed in a non balanced way

这是你必须自己弄清楚的平衡点。 Cassandra 中的 PRIMARY KEY 定义是数据分布和查询灵活性之间的取舍。除非 company 的基数非常低（如个位数），否则您不必担心在集群中创建热点。

此外，如果某个特定公司变得太大，您可以使用一种称为 "bucketing." 的建模技术如果我要 "bucket" 您的 user_by_company table，我会先添加一个 company_bucket 列，并将其作为附加（复合）分区键：

CREATE TABLE user_by_company (
    company text,
    company_bucket text,
    username text,
    email text,
    PRIMARY KEY ((company,company_bucket),username)
);

至于往那个桶里放什么，由你决定。也许那个特定的公司有东方和西方的位置，所以这样的事情可能会起作用：

INSERT INTO user_by_company (company,company_bucket,username,email)
  VALUES ('Acme','West','Jayne','jcobb@serenity.com');

这里的缺点是，您必须在查询 table 时提供 company_bucket。但如果公司变得太大，它是一个可以帮助你的解决方案。

基于 Cassandra table 的查询和主键唯一性

Cassandra table based query and primary key uniqueness

cql

primary-key

cassandra