DynamoDB Streams with Lambda,如何按顺序(按逻辑组)处理记录?

DynamoDB Streams with Lambda, how to process the records in order (by logical groups)?

我想使用 DynamoDB Streams + AWS Lambda 来处理聊天消息。关于同一对话 user_idX:user_idY(一个房间)的消息必须按顺序处理。全局排序并不重要。

假设我以正确的顺序输入 DynamoDB(room:msg1,room:msg2,等等),如何保证 Stream 将按顺序输入 AWS Lambda,保证顺序跨单个流处理相关消息(房间)?

例如,考虑到我有 2 个分片,如何确保逻辑组转到同一个分片?

我必须完成这个:

Shard 1: 12:12:msg3 12:12:msg2 12:12:msg1 ==> consumer
Shard 2: 13:24:msg2 51:91:msg3 13:24:msg1 51:92:msg2 51:92:msg1 ==> consumer

而不是这个(消息遵循我保存在数据库中的顺序,但它们被放置在不同的碎片中,因此错误地并行处理同一个房间的不同序列):

Shard 1: 13:24:msg2 51:92:msg2 12:12:msg2 51:92:msg2 12:12:msg1 ==> consumer
Shard 2: 51:91:msg3 12:12:msg3 13:24:msg1 51:92:msg1 ==> consumer

这位官方 post 提到了这一点,但我在文档中找不到如何实现它的任何地方:

The relative ordering of a sequence of changes made to a single primary key will be preserved within a shard. Further, a given key will be present in at most one of a set of sibling shards that are active at a given point in time. As a result, your code can simply process the stream records within a shard in order to accurately track changes to an item.

问题

1) 如何在 DynamoDB Streams 中设置分区键?

2) 如何创建保证分区键一致传递的流分片?

3) 这真的可能吗?由于官方文章提到:a given key will be present in most one of a set of sibling shards that are active at a given point in a time 所以似乎 msg1 可能会进入分片1 然后 msg2 到分片 2,如我上面的示例?

已编辑:this 问题中,我发现了这个:

The amount of shards that your stream has, is based on the amount of partitions the table has. So if you have a DDB table with 4 partitions, then your stream will have 4 shards. Each shard corresponds to a specific partition, so given that all items with the same partition key should be present in the same partition, it also means that those items will be present in the same shard.

这是否意味着我可以自动实现我所需要的? "All items with the same partition will be present in the same shard"。 Lambda 是否尊重这一点?

编辑 2: 来自 FAQ

The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel.

我不关心全局排序,只关心示例中的逻辑顺序。仍然不清楚碎片是否与常见问题解答中的这个答案逻辑组合。

In-order 对同一键的更新处理将自动进行。如 this presentation 中所述,每个活动分片一个 Lambda 函数是 运行。因为特定 partition/sort 键的所有更新都出现在一个分片沿袭中,所以它们按顺序处理。