从 SSTable 读取时 Cassandra 如何处理重复数据

How Cassandra handle duplicated data when reading from SSTable

在 Datastax 的 documentation 中，它说：

During a write, Cassandra adds each new row to the database without checking on whether a duplicate record exists. This policy makes it possible that many versions of the same row may exist in the database.

据我了解，这意味着可能有超过 1 个未压缩的 SSTables 包含同一行的不同版本。 Cassandra 在从这些 SSTables 中读取数据时如何处理重复数据？

@quangh : 正如文档中所述:

This is why Cassandra performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas. The version with the most recent timestamp is the only one returned to the client ("last-write-wins").

所有的写操作都有关联的时间戳。在这种情况下，不同的节点将具有同一行的不同版本。但是在读取操作期间，Cassandra 将选择具有最新时间戳的行。我希望这能解决您的疑问。

从 SSTable 读取时 Cassandra 如何处理重复数据

How Cassandra handle duplicated data when reading from SSTable

cassandra

datastax