如何理解Cassandra中宽行的概念及相关概念？

Question

从Cassandra The Definite Guide:

中，我觉得很难理解宽行的概念和相关概念

Cassandra uses a special primary key called a composite key (or compound key) to represent wide rows, also called partitions. The composite key consists of a partition key, plus an optional set of clustering columns. The partition key is used to determine the nodes on which rows are stored and can itself consist of multiple columns. The clustering columns are used to control how data is sorted for storage within a partition. Cassandra also supports an additional construct called a static column, which is for storing data that is not part of the primary key but is shared by every row in a partition.

Figure 4-5 shows how each partition is uniquely identified by a partition key, and how the clustering keys are used to uniquely identify the rows within a partition.

宽行和分区是同义词吗？

在"the partition key is used to determine the nodes on which rows are stored and can itself consist of multiple columns"和"each partition is uniquely identified by a partition key"中，

既然分区键是宽行的，为什么会有多个"rows"（这里的"rows"是指"wide rows"）？
分区键如何"determine the nodes on which rows are stored"?
分区键如何用于"each partition is uniquely identified by a partition key"？

在"the clustering columns are used to control how data is sorted for storage within a partition"、

什么是聚类列，例如图中的聚类列是什么？
如何聚类列"control how data is sorted for storage within a partition"？

在"the clustering keys are used to uniquely identify the rows within a partition",

分区是宽行的代名词，"the rows within a partition"是什么意思？
如何"the clustering keys are used to uniquely identify the rows within a partition"？

谢谢。

Answer 1

Are a wide row and a partition synonyms?

分区和行可以被认为是同义词。宽行是所选分区键将导致该键的大量 cells 的情况。考虑一个场景，其中所有人都在一个国家，并且使用的分区键是城市，那么一个城市将有一行，所有人都将在该行中 cells。对于地铁城市，这将导致广泛的行。另一个示例可以存储每隔几秒接收到的传感器数据，并将 sensorId 作为分区键，这将导致大量的 cells 几年后。

since a partition key is for a wide row, why are there multiple "rows" (does "rows" here mean "wide rows")?

同上

how does the partition key "determine the nodes on which rows are stored"?

从分区键哈希（默认为 MurMur3Hash）生成，cassandra 中的每个节点负责值的范围。考虑分区键值的哈希结果为 20，Node1 负责 1 到 100 的范围，然后该分区将驻留在 Node1 上。

How can a partition key be used for "each partition is uniquely identified by a partition key"?

如上所述，分区键决定数据驻留在哪个节点上。数据表示可以被视为只能具有唯一键的巨大地图。

what is a clustering column, for example, what are the clustering columns in the figure?

考虑像 Create TABLE test (a text,b int, c text, PRIMARY KEY(a,b)) 这样创建的 table，这里 a 是分区键，b 是集群列。附图中clustering key是聚类列，整个包围框是单元格。

How do the clustering columns "control how data is sorted for storage within a partition"?

Cassandra 将使用上例 table 中的列 b 以升序 table 对数据进行排序。也可以改成降序

INSERT INTO test(a,b,c) VALUES('test',2,'test2')
INSERT INTO test(a,b,c) VALUES('test',1,'test1')
INSERT INTO test(a,b,c) VALUES('test-new',1,'test1')

如果你运行以上查询按此顺序cassandra将按以下顺序存储数据（数据表示比下面多得多..只需检查b列的顺序）：

test -> [b:1,c=test1] [b:2,c=test2]
test-new -> [b:1,c=test1]

a partition is a synonym of a wide row, what does it mean by "the rows within a partition"?

聚类列用于识别分区内的 cells（单元格是比行更好的术语）。示例 SELECT * from test where a='test' and b=1 将选取带有 b:1 的单元格进行分区键测试。

How "the clustering keys are used to uniquely identify the rows within a partition"?

上面的回答应该也解释了这一点。

如何理解Cassandra中宽行的概念及相关概念？

How to understand the concept of wide row and related concepts in Cassandra?

cassandra

column-family