为什么在 Cassandra 中创建两个表供用户按用户名和电子邮件进行搜索而不是添加索引?
In Cassandra why create two tables for users to search by username and email instead of adding an index?
阅读这篇文章:Basic Rules of Cassandra Data Modeling 他们说,如果你想通过电子邮件和用户名查询用户,你应该做两个 tables:
CREATE TABLE users_by_username (
username text PRIMARY KEY,
email text,
age int
)
CREATE TABLE users_by_email (
email text PRIMARY KEY,
username text,
age int
)
你为什么要这样做?这么小的东西不会让数据更难管理吗?你为什么不做一个 table 并有一个索引?
-- A table holding the user info
CREATE TABLE users (
username text,
email text,
age int,
PRIMARY KEY((username),email)
);
-- An index that gives good performance on email searching
CREATE INDEX user_email ON users (email);
你应该做两个 table 因为索引中的高基数问题
If you create an index on a high-cardinality column, which has many distinct values, a query between the fields will incur many seeks for very few results. In the table with a billion emails, looking up user by email (a value that is typically unique for each user) is likely to be very inefficient.
当您使用电子邮件执行查询时,cassandra 将在每个节点上执行此查询,每个节点将查找其本地索引并发送响应。您的合并结果将是单个用户。您在每个节点上查询以获得单个结果,这是非常低效的
相反,如果您通过电子邮件为用户创建一个单独的 table。并且执行查询,cassandra只需要通过partition key email查找到单个节点即可。
或者如果您使用的是 cassandra 版本 3.0 或更高版本,您可以使用 Materialized Views 来自动维护您的非规范化。
来源:http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_when_use_index_c.html
阅读这篇文章:Basic Rules of Cassandra Data Modeling 他们说,如果你想通过电子邮件和用户名查询用户,你应该做两个 tables:
CREATE TABLE users_by_username (
username text PRIMARY KEY,
email text,
age int
)
CREATE TABLE users_by_email (
email text PRIMARY KEY,
username text,
age int
)
你为什么要这样做?这么小的东西不会让数据更难管理吗?你为什么不做一个 table 并有一个索引?
-- A table holding the user info
CREATE TABLE users (
username text,
email text,
age int,
PRIMARY KEY((username),email)
);
-- An index that gives good performance on email searching
CREATE INDEX user_email ON users (email);
你应该做两个 table 因为索引中的高基数问题
If you create an index on a high-cardinality column, which has many distinct values, a query between the fields will incur many seeks for very few results. In the table with a billion emails, looking up user by email (a value that is typically unique for each user) is likely to be very inefficient.
当您使用电子邮件执行查询时,cassandra 将在每个节点上执行此查询,每个节点将查找其本地索引并发送响应。您的合并结果将是单个用户。您在每个节点上查询以获得单个结果,这是非常低效的
相反,如果您通过电子邮件为用户创建一个单独的 table。并且执行查询,cassandra只需要通过partition key email查找到单个节点即可。
或者如果您使用的是 cassandra 版本 3.0 或更高版本,您可以使用 Materialized Views 来自动维护您的非规范化。
来源:http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_when_use_index_c.html