运动数据流消费者的高可用性

high availability for kinesis data stream consumer

我想做如下数据发送架构。

producer --> Kinesis Data stream --> consumer

消费者服务器可以关闭,所以我认为至少应该有2个消费者。 对吗?

当一个数据流有两个消费者时,有没有办法处理每个消费者一半的数据?据我所知,没有办法。 如果每个消费者都消费相同的数据,那就是浪费时间和成本。 因为我只是为了高可用性而制作了 2 个消费者。 (用于故障转移)

在网络架构中, ELB或L4可以通过负载均衡将一半数据发送到每个服务器。

我想知道运动数据流的类似方法。

When there are two consumer for one data stream, is there any way to handle half data per consumer? As I know, there is no way.

你错了。

你应该通过 Kinesis Developer guide or more specifically https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-scaling.html.

运动流由 1 个或多个分片组成。每个分片都可以 独立处理。

引用上面链接的例子,

The following example illustrates how the KCL helps you handle scaling and resharding:

For example, if your application is running on one EC2 instance, and is processing one Kinesis data stream that has four shards. This one instance has one KCL worker and four record processors (one record processor for every shard). These four record processors run in parallel within the same process.

Next, if you scale the application to use another instance, you have two instances processing one stream that has four shards. When the KCL worker starts up on the second instance, it load-balances with the first instance, so that each instance now processes two shards.

If you then decide to split the four shards into five shards. The KCL again coordinates the processing across instances: one instance processes three shards, and the other processes two shards. A similar coordination occurs when you merge shards.

您只需确保两个 Kinesis 消费者应用程序(运行 在不同机器上)都配置了相同的应用程序名称。 KCL 跟踪应用程序名称、分片检查点作为 Dynamo DB table。这个 dynamo db table 也用于定义消费者应用程序之间的碎片的所有权。

因此,如果您在不同机器上有一个包含 4 个分片和两个消费者应用程序的 Kinesis Stream 运行。然后分片平衡将按以下方式完成。

----Shard1---> application-instance-1
----Shard2---> application-instance-1
----Shard3---> application-instance-2
----Shard4---> application-instance-2

假设application-instance-1发生故障。然后 application-instance-2 将开始处理所有分片。

----Shard1---> application-instance-2
----Shard2---> application-instance-2
----Shard3---> application-instance-2
----Shard4---> application-instance-2