在 DynamoDB 中模拟每日游戏排名
Model daily game ranking in DynamoDB
我有一个问题。我是 DynamoDB 的新手,但长期以来一直致力于 SQL 数据库的大规模聚合。
假设您有一个名为 GamePoints (PlayerId, GameId, Points) 的 table 并且想要创建一个排名 table Rankings (PlayerId, Points) 按分数排序。
此 table 需要每小时更新一次,但不需要保留其内容的先前版本。只是目前的排名。
查询总是给我排名table(带分页)。
GamePoints table 会随着时间的推移变得非常非常大。
问题:
这是 DynamoDB 的最佳实践模式吗?
您将如何进行这种聚合?
谢谢
您可以启用 DynamoDB Stream on the GamePoints table. You can read stream records from the stream to maintain materialized views, including aggregations, like the Rankings table. Set StreamViewType=NEW_IMAGE on your GamePoints table, and set up a Lambda function 从您的流中使用流记录并使用原子计数器更新每个玩家的点数(UpdateItem、HK=player_id、UpdateExpression="ADD Points #stream_record_points"、ExpressionAttributeValues ={"#stream_record_points":[将流记录中的值放在这里。]})。由于排名 table 的哈希键仍然是玩家 ID,您可以每小时对排名 table 进行完整 table 扫描,以获得 n 个最高玩家,或所有玩家并排序。
但是,考虑到字段的大小(player_id 和点数可能不会超过 100 个字节),由 Lambda 函数更新的内存缓存同样可以很好地用于跟踪降序实时排序玩家列表及其总点数。最后,如果您的应用程序需要对 Stream 记录进行有状态处理,您可以在应用程序服务器上使用 Kinesis Client Library combined with the DynamoDB Streams Kinesis Adapter 来实现与将 Lambda 函数订阅到 GamePoints table.[=12 的 Stream 相同的效果=]
PutItem
有助于根据您的用例实现持久性逻辑:
PutItem Creates a new item, or replaces an old item with a new item.
If an item that has the same primary key as the new item already
exists in the specified table, the new item completely replaces the
existing item. You can perform a conditional put operation (add a new
item if one with the specified primary key doesn't exist), or replace
an existing item if it has certain attribute values. Source:
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html
在查询数据方面,如果您确定要读取整个 Ranking
table,我建议通过 [=21] 的几次读取操作来完成=]minimum acceptable page size 这样您就可以充分利用您预配的吞吐量。有关详细信息,请参阅以下指南:
Instead of using a large Scan operation, you can use the following
techniques to minimize the impact of a scan on a table's provisioned
throughput.
Reduce Page Size
Because a Scan operation reads an entire page (by default, 1 MB), you
can reduce the impact of the scan operation by setting a smaller page
size. The Scan operation provides a Limit parameter that you can use
to set the page size for your request. Each Scan or Query request that
has a smaller page size uses fewer read operations and creates a
"pause" between each request. For example, if each item is 4 KB and
you set the page size to 40 items, then a Query request would consume
only 40 strongly consistent read operations or 20 eventually
consistent read operations. A larger number of smaller Scan or Query
operations would allow your other critical requests to succeed without
throttling.
Isolate Scan Operations
DynamoDB is designed for easy scalability. As a result, an application
can create tables for distinct purposes, possibly even duplicating
content across several tables. You want to perform scans on a table
that is not taking "mission-critical" traffic. Some applications
handle this load by rotating traffic hourly between two tables – one
for critical traffic, and one for bookkeeping. Other applications can
do this by performing every write on two tables: a "mission-critical"
table, and a "shadow" table.
您还可以按 GameId
(例如 Ranking_GameId)对 table 进行分段,以更均匀地分布数据并在配置吞吐量方面为您提供更多粒度。
一种简单的方法是使用 DynamoDb 的 HashKey 和 Sort 键。例如,HashKey 是 GameId,Sort key 是 Score。然后,您使用降序和限制查询 table 以获得 O(1) 中的实时顶级玩家。
要获得给定玩家的排名,您可以使用与上述相同的技术:您在 O(1) 中获得前 1000 个分数,然后使用 BinarySearch 在O(log n) 在您的应用程序服务器上。
如果用户的排名为1000,您可以指定该用户的排名为1000+。您显然也可以将 1000 更改为更大的数字(例如 100,000)。
希望这对您有所帮助。
亨利
我有一个问题。我是 DynamoDB 的新手,但长期以来一直致力于 SQL 数据库的大规模聚合。
假设您有一个名为 GamePoints (PlayerId, GameId, Points) 的 table 并且想要创建一个排名 table Rankings (PlayerId, Points) 按分数排序。
此 table 需要每小时更新一次,但不需要保留其内容的先前版本。只是目前的排名。
查询总是给我排名table(带分页)。
GamePoints table 会随着时间的推移变得非常非常大。
问题:
这是 DynamoDB 的最佳实践模式吗? 您将如何进行这种聚合?
谢谢
您可以启用 DynamoDB Stream on the GamePoints table. You can read stream records from the stream to maintain materialized views, including aggregations, like the Rankings table. Set StreamViewType=NEW_IMAGE on your GamePoints table, and set up a Lambda function 从您的流中使用流记录并使用原子计数器更新每个玩家的点数(UpdateItem、HK=player_id、UpdateExpression="ADD Points #stream_record_points"、ExpressionAttributeValues ={"#stream_record_points":[将流记录中的值放在这里。]})。由于排名 table 的哈希键仍然是玩家 ID,您可以每小时对排名 table 进行完整 table 扫描,以获得 n 个最高玩家,或所有玩家并排序。
但是,考虑到字段的大小(player_id 和点数可能不会超过 100 个字节),由 Lambda 函数更新的内存缓存同样可以很好地用于跟踪降序实时排序玩家列表及其总点数。最后,如果您的应用程序需要对 Stream 记录进行有状态处理,您可以在应用程序服务器上使用 Kinesis Client Library combined with the DynamoDB Streams Kinesis Adapter 来实现与将 Lambda 函数订阅到 GamePoints table.[=12 的 Stream 相同的效果=]
PutItem
有助于根据您的用例实现持久性逻辑:
PutItem Creates a new item, or replaces an old item with a new item. If an item that has the same primary key as the new item already exists in the specified table, the new item completely replaces the existing item. You can perform a conditional put operation (add a new item if one with the specified primary key doesn't exist), or replace an existing item if it has certain attribute values. Source: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html
在查询数据方面,如果您确定要读取整个 Ranking
table,我建议通过 [=21] 的几次读取操作来完成=]minimum acceptable page size 这样您就可以充分利用您预配的吞吐量。有关详细信息,请参阅以下指南:
Instead of using a large Scan operation, you can use the following techniques to minimize the impact of a scan on a table's provisioned throughput.
Reduce Page Size
Because a Scan operation reads an entire page (by default, 1 MB), you can reduce the impact of the scan operation by setting a smaller page size. The Scan operation provides a Limit parameter that you can use to set the page size for your request. Each Scan or Query request that has a smaller page size uses fewer read operations and creates a "pause" between each request. For example, if each item is 4 KB and you set the page size to 40 items, then a Query request would consume only 40 strongly consistent read operations or 20 eventually consistent read operations. A larger number of smaller Scan or Query operations would allow your other critical requests to succeed without throttling.
Isolate Scan Operations
DynamoDB is designed for easy scalability. As a result, an application can create tables for distinct purposes, possibly even duplicating content across several tables. You want to perform scans on a table that is not taking "mission-critical" traffic. Some applications handle this load by rotating traffic hourly between two tables – one for critical traffic, and one for bookkeeping. Other applications can do this by performing every write on two tables: a "mission-critical" table, and a "shadow" table.
您还可以按 GameId
(例如 Ranking_GameId)对 table 进行分段,以更均匀地分布数据并在配置吞吐量方面为您提供更多粒度。
一种简单的方法是使用 DynamoDb 的 HashKey 和 Sort 键。例如,HashKey 是 GameId,Sort key 是 Score。然后,您使用降序和限制查询 table 以获得 O(1) 中的实时顶级玩家。
要获得给定玩家的排名,您可以使用与上述相同的技术:您在 O(1) 中获得前 1000 个分数,然后使用 BinarySearch 在O(log n) 在您的应用程序服务器上。
如果用户的排名为1000,您可以指定该用户的排名为1000+。您显然也可以将 1000 更改为更大的数字(例如 100,000)。
希望这对您有所帮助。
亨利