针对指定用例在 Cassandra 中进行规范化？

Question

我有一个设备 table（比如 'device' table），它具有包含当前统计信息的静态字段，我还有另一个 table（比如 'devicestat' table) 其中包含该设备每分钟收集的统计信息，并按时间戳排序，如下所示。

示例：

CREATE TABLE device(
   "partitionId" text,
   "deviceId" text,
   "name" text,
   "totalMemoryInMB" bigint,
   "totalCpu" int,
   "currentUsedMemoryInMB" bigint,
   "totalStorageInMB" bigint,
   "currentUsedCpu" int,
   "ipAddress" text,
    primary key ("partitionId","deviceId"));


CREATE TABLE devicestat(
   "deviceId" text,
   "timestamp" timestamp,
   "totalMemoryInMB" bigint,
   "totalCpu" int,
   "usedMemoryInMB" bigint,
   "totalStorageInMB" bigint,
   "usedCpu" int
    primary key ("deviceId","timestamp"));

其中，

currentUsedMemoryInMB & currentUsedCpu => Hold the most recent statistics

usedMemoryInMB & usedCpu => Hold the most and also old statistics based on time stamp.

有人可以建议我以下概念的正确方法吗？

所以每当我需要我从 device table 读取的最新统计数据的静态数据时，每当我需要我读取的设备统计数据的历史记录时来自 devicestat table

这对我来说很好，但唯一的问题是我需要在 table 中写统计数据，以防 devicestat table 这将是一个基于时间戳的新条目，但在 device table 的情况下，我们将只更新统计信息。您对此有何看法，这是否需要仅在单个统计数据 table 中维护，或者是否也可以更新设备 table 中的最新统计数据。

Answer 1

在 Cassandra 中，常见的方法是每个查询有一个 table(ColumnFamily)。而非规范化也是Cassandra的一个很好的实践。所以在这种情况下保留 2 个列族是可以的。

从 devicestat table 获取最新统计数据的另一种方法是使数据按时间戳 DESC 排序：

CREATE TABLE devicestat(
   "deviceId" text,
   "timestamp" timestamp,
   "totalMemoryInMB" bigint,
   "totalCpu" int,
   "usedMemoryInMB" bigint,
   "totalStorageInMB" bigint,
   "usedCpu" int
    primary key ("deviceId","timestamp"))
WITH CLUSTERING ORDER BY (timestamp DESC);

所以当你知道 deviceId

时你可以用 limit 1 查询

select * from devicestat where deviceId = 'someId' limit 1;

但是如果您想按 partitionId 列出设备的最新统计信息，那么您使用最新统计信息更新设备 table 的方法是正确的

针对指定用例在 Cassandra 中进行规范化？

Normalization in Cassandra for the specified use case?

cassandra

cassandra-3.0