DynamoDB 设计分区键、范围键和 GSI

Question

我正在设计一个基于 DynamoDB 的新 Table。我已经阅读了一些文档，但我无法弄清楚我应该遵循哪种设计模式才能避免将来出现问题。

当前方法

Table - 事件

 - eventId (HashKey)
 - userId
 - createdAt
 - some other attributes...

Table - 用户

 - userId (HashKey)
 - name
 - birth
 - address

事件 table 将有大量条目，例如数百万。用户目前大约有 20 个条目。

我需要执行以下查询：

 - GET paginated events from specific userId ordered by createdAt
 - GET paginated events from specific userId between some range of dates and ordered by createdAt 
 - GET specific event entry by eventId

所以我想使用以下设置在事件 table 上创建 GSI（全球二级索引）：

 - userId (HashKey)
 - createdAt (RangeKey)

但我的问题是：我的初始设计有意义吗？不知何故，我觉得我可以使用以下设置来设计事件 table：

 - userId (HashKey)
 - eventId (SortKey)

但我认为按照这种方法我会运行陷入热分区陷阱。

一些意见和建议将不胜感激。

谢谢。

Answer 1

我觉得你的方法很好。牢记最佳实践 https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html，特别是

Generally speaking, you should design your application for uniform activity across all logical partition keys in the Table and its secondary indexes. You can determine the access patterns that your application requires, and estimate the total RCUs and WCUs that each table and secondary Index requires.

意思是，数据突变必须尽可能均匀地分布在所有分区中。在您的情况下，将会有很多事件和有限数量的用户，这表明每个用户必须有大量的事件。

如果您选择根据 eventid 对 table 进行分区，您最终会得到数百万个分区，每个分区都具有相同的用户 ID。假设您需要按用户查询事件，读取最终将均匀分布在所有分区中。也为每个事件写入，将平均分配给所有事件。

但是，如果您选择 userid 作为分区键，与其他情况相比，更多的请求将在同一分区结束。因此，我建议使用前者（eventid 是分区键）。

那是我的 2 美分。

DynamoDB 设计分区键、范围键和 GSI

DynamoDB Design PartitionKey, RangeKey and GSI

amazon-web-services

amazon-dynamodb

dynamodb-queries

amazon-dynamodb-index