当同一个实例组在两个不同的负载均衡器后面时，速率限制将如何工作

How rate limiting will work when same instance group is behind two different load balancers

我正在阅读有关 GCP 中的速率限制和自动缩放的信息，但遇到了这个问题：

场景：

我创建了一个具有自动缩放 OFF 的实例组 ig。
我创建了一个负载均衡器lb1，详情如下：
- lb1 包含指向实例组的后端服务 bs1 ig 并且 whole group 的最大 RPS 设置为 1000。
- 前端端口：8080
- 路径规则：/alpha/*
- lb1 是一个外部负载均衡器
我又创建了一个负载均衡器lb2，详情如下：
- lb2 包含指向实例组的后端服务 bs2 ig 并且 whole group 的最大 RPS 设置为 2000。
- 前端端口：9090
- 路径规则：/beta/*
- lb2 是区域负载均衡器

我的问题：

谁将监控两个负载均衡器所服务的请求？
1000 或 2000 的上限是多少？
整体请求（即通过 lb1 和 lb2）是否会受到速率限制，或者将对两个请求流应用单独的限制？

TL;DR - The RPS is set in the Backend Service，因此每个负载均衡器都有自己的 RPS 限制，相互独立。

Who will monitor the requests served by the both the load balancers?

Google Compute Engine (GCE) 将监控负载均衡器所服务的请求，并相应地引导流量保持在后端服务中每个后端的 RPS 限制内。

Which limit will be honoured 1000 or 2000?

第一个负载均衡器为 1000，第二个负载均衡器为 2000。请记住，您正在使用 2 个单独的后端服务 bs1 和 bs2 分别用于 lb1 和 lb2。

Will the overall requests (i.e via lb1 and lb2) will be rate limited or individual limits will be applied for both the request flows?

通过 lb1 到 bs1 的请求将符合每个后端 VM 的最大值 1000 RPS。通过 lb2 到 bs2 的请求将符合每个后端 VM 的最大值 2000 RPS。因此，您在任何给定后端 VM 实例中的服务运行应该至少能够处理 3000 RPS.

更长的版本

实例组无法指定 RPS，only backend services do。实例组仅有助于对实例列表进行分组。因此，尽管您可以在多个后端服务中使用相同的实例组，但如果您的目标是在多个后端服务之间共享实例，则需要考虑您在相应后端服务中设置的 RPS 值。 GCE 将无法自动解决这个问题。

一个后端服务理想地代表了一个微服务，它由一组后端虚拟机（来自实例组）提供服务。您应该事先计算单个后端实例（即 VM 内的服务运行）可以处理多少最大 RPS 以设置此限制。如果您打算跨后端服务共享 VM，则需要确保在最坏情况下的组合 RPS 限制是 VM 内的服务能够处理的。

Google 计算引擎 (GCE) 将监控每个后端服务的指标（即在您的情况下每秒请求数）并将其用于负载平衡。每个负载均衡器在逻辑上都是不同的，因此不会有跨负载均衡器的聚合（即使使用相同的实例组）。

Load distribution algorithm

HTTP(S) load balancing provides two methods of determining instance load. Within the backend service object, the balancingMode property selects between the requests per second (RPS) and CPU utilization modes. Both modes allow a maximum value to be specified; the HTTP load balancer will try to ensure that load remains under the limit, but short bursts above the limit can occur during failover or load spike events.

Incoming requests are sent to the region closest to the user, provided that region has available capacity. If more than one zone is configured with backends in a region, the traffic is distributed across the instance groups in each zone according to each group's capacity. Within the zone, the requests are spread evenly over the instances using a round-robin algorithm. Round-robin distribution can be overridden by configuring session affinity.

maxRate 和 maxRatePerInstance

在后端服务中，有2 configuration fields related to RPS，一个是maxRate，一个是maxRatePerInstance。 maxRate 可用于设置每个组的 RPS 而 maxRatePerInstance 可用于设置每个实例的 RPS。如果需要，看起来两者可以结合使用。

backends[].maxRate

integer

The max requests per second (RPS) of the group. Can be used with either RATE or UTILIZATION balancing modes, but required if RATE mode. For RATE mode, either maxRate or maxRatePerInstance must be set.

This cannot be used for internal load balancing.

backends[].maxRatePerInstance

float

The max requests per second (RPS) that a single backend instance can handle.This is used to calculate the capacity of the group. Can be used in either balancing mode. For RATE mode, either maxRate or maxRatePerInstance must be set.

This cannot be used for internal load balancing.

以高于指定 RPS 的速率接收请求

如果您碰巧以高于 RPS 的速率接收请求并且您禁用了自动缩放，我无法在 Google 云网站上找到任何关于确切预期行为的文档。我能找到的最接近的是 this one, where it specifies that the load balancer will try to keep each instance at or below the specified RPS. So it could mean that the requests could get dropped if it exceeds the RPS, and clients might see one of the 5XX error codes (possibly 502) 基于此：

failed_to_pick_backend

The load balancer failed to pick a healthy backend to handle the request.

502

您可能会通过设置相当低的 RPS（例如 10 或 20）来解决这个问题，然后看看会发生什么。查看您在后端收到请求的时间戳以确定行为。此外，限制可能不会正好发生在第 11 个或第 21 个请求上，因此请尝试每秒发送远多于此的请求以验证请求是否被丢弃。

With Autoscaling

如果您启用自动缩放，这将自动触发自动缩放器并使其根据您在自动缩放器中设置的目标利用率水平扩展实例组中的实例数。

注意：更新了答案，因为您实际上指定了您正在使用 2 个单独的后端服务。

当同一个实例组在两个不同的负载均衡器后面时，速率限制将如何工作

How rate limiting will work when same instance group is behind two different load balancers

load-balancing

rate-limiting

autoscaling

google-cloud-platform

更长的版本

maxRate 和 maxRatePerInstance

以高于指定 RPS 的速率接收请求

With Autoscaling