如何在 Cassandra 中操作主键和聚类列？

How to primary keys and clustering columns operate in Cassandra?

我对 Cassandra 中的主键如何实现快速数据访问感到困惑。举例来说，我创建了一个 table 个具有以下模式列的 Students：

我选择主键作为学号。我的理解是，所有学生都将根据该值的一些哈希值放置在集群周围。假设我还选择国家/地区作为聚类列。因此，在学生的每个分区（根据他们的 Id 进行拆分）中，他们将按国家/地区（大概按字母顺序）排序。

因此，如果我想检索特定国家/地区的所有学生，是否必须访问集群中的多个节点？虽然学生在每个节点内按国家/地区排序，但没有什么可以说特定国家/地区的所有学生都存储在同一节点上？甚至支持这种类型的查询吗？

如果我只将 5 个学生添加到 5 个节点的集群中，如果学生 ID 是 UUID，是否可以将所有学生存储在单独的节点上？

So if I then want to retrieve all students for a specific country will I have to visit multiple nodes in the cluster?

是的。

While the students have been ordered by Country within each node there is nothing to say that all the students for a specific country have been stored on the same node?

正确。

Is this type of query even supported?

是的，但这在 Cassandra 中被认为是一种反模式。发生的情况是协调器（从客户端接收请求的节点）将不得不查询所有其他节点，因为它必须扫描该列族的所有行。

If I had only added 5 students to a 5 nodes cluster would it be possible that all the students would be stored on separate nodes if the Student Id was a UUID?

是的。

解决问题的方法是为每个查询设置一个列族（一个用于按学生 ID 选择，另一个用于按国家/地区选择，每个查询都有不同的主查询）同时复制行（当您创建了一个学生，您必须将其插入到两个列族中。

如何在 Cassandra 中操作主键和聚类列？

How to primary keys and clustering columns operate in Cassandra?

cassandra

cassandra-2.0

cassandra-3.0