API 网关是否引入了显着延迟?
Is significant latency introduced by API Gateway?
我正在尝试找出通话延迟的来源,请让我知道是否可以以更清晰的格式显示这些信息!
一些背景:我有两个系统——系统 A 和系统 B。我手动(通过 Postman)命中系统 A 上的端点,该端点调用系统 B 上的端点。
系统 A 托管在 EC2 实例上。
- 当系统 B 托管在 API 网关后面的 Lambda 函数上时,
通话延迟为 125 毫秒。
- 当系统 B 托管在
EC2 实例,调用延迟为 8 毫秒。
- 当系统 B 是
托管在 API 网关后面的 EC2 实例上,
通话时间为 100 毫秒。
因此,我的假设是 API 网关也是与 Lambda 函数配对时延迟增加的原因。任何人都可以确认是否是这种情况,如果是这样,API 网关在做什么会增加延迟?有什么办法吗?谢谢!
在直接案例 (#2) 中,您使用的是 SSL 吗? 8 毫秒对于 SSL 来说非常快,但如果它在 AZ 内,我认为这是可能的。如果您不在那里使用 SSL,那么使用 APIGW 将在客户端和 CloudFront 之间引入安全的 TLS 连接,这当然会带来延迟损失。但通常这对于安全连接来说是值得的,因为延迟仅发生在初始建立时。
一旦完全建立连接,或者当 API 具有适度的持续音量时,我预计 APIGW 的平均延迟会显着下降。不过,在建立新连接时,您仍然会看到约 100 毫秒的延迟。
不幸的是,您描述的用例 (EC2 -> APIGW -> EC2) 现在不是很好。由于 APIGW 位于 CloudFront 之后,它针对全球的客户端进行了优化,但是当客户端位于 EC2 上时,您会看到额外的延迟。
编辑:
添加 Lambda 时你只看到一个小惩罚的原因是 APIGW 已经建立了许多到 Lambda 的连接,因为它是一个具有少量 IP 的单一端点。 APIGW 中的实际开销(与连接无关)应该类似于 Lambda 开销。
亚马逊对此表示支持:
With API Gateway it requires going from the client to API Gateway,
which means leaving the VPC and going out to the internet, then back
to your VPC to go to your other EC2 Instance, then back to API
Gateway, which means leaving your VPC again and then back to your
first EC2 instance.
So this additional latency is expected. The only way to lower the
latency is to add in API Caching which is only going to be useful is
if the content you are requesting is going to be static and not
updating constantly. You will still see the longer latency when the
item is removed from cache and needs to be fetched from the System,
but it will lower most calls.
所以我想延迟是正常的,这很不幸,但希望这不是我们必须不断向前处理的问题。
可能不完全原来的问题要求什么,但我会添加关于CloudFront的评论。
根据我的经验,CloudFront 和 API 网关平均每个 HTTPS 请求都会增加至少 100 毫秒 - 甚至更多。
This is due to the fact that in order to secure your API call, API Gateway enforces SSL in all of its components. This means that if you are using SSL on your backend, that your first API call will have to negotiate 3 SSL handshakes:
- Client to CloudFront
- CloudFront to API Gateway
- API Gateway to your backend
It is not uncommon for these handshakes to take over 100 milliseconds, meaning that a single request to an inactive API could see over 300 milliseconds of additional overhead. Both CloudFront and API Gateway attempt to reuse connections, so over a large number of requests you’d expect to see that the overhead for each call would approach only the cost of the initial SSL handshake. Unfortunately, if you’re testing from a web browser and making a single call against an API not yet in production, you will likely not see this.
在同一讨论中,最终阐明了 "large number of requests" 应该是什么才能真正看到连接重用:
Additionally, when I meant large, I should have been slightly more precise in scale. 1000 requests from a single source may not see significant reuse, but APIs that are seeing that many per second from multiple sources would definitely expect to see the results I mentioned.
...
Unfortunately, while cannot give you an exact number, you will not see any significant connection reuse until you approach closer to 100 requests per second.
请记住,这是 2016 年中后期的帖子,应该已经有了一些改进。但根据我自己的经验,这种开销仍然存在,并且在 2000 rps 的简单 API 上执行负载测试仍然给我 2018 年超过 200 毫秒的额外延迟。
来源:https://forums.aws.amazon.com/thread.jspa?messageID=737224
我正在尝试找出通话延迟的来源,请让我知道是否可以以更清晰的格式显示这些信息!
一些背景:我有两个系统——系统 A 和系统 B。我手动(通过 Postman)命中系统 A 上的端点,该端点调用系统 B 上的端点。 系统 A 托管在 EC2 实例上。
- 当系统 B 托管在 API 网关后面的 Lambda 函数上时, 通话延迟为 125 毫秒。
- 当系统 B 托管在 EC2 实例,调用延迟为 8 毫秒。
- 当系统 B 是 托管在 API 网关后面的 EC2 实例上, 通话时间为 100 毫秒。
因此,我的假设是 API 网关也是与 Lambda 函数配对时延迟增加的原因。任何人都可以确认是否是这种情况,如果是这样,API 网关在做什么会增加延迟?有什么办法吗?谢谢!
在直接案例 (#2) 中,您使用的是 SSL 吗? 8 毫秒对于 SSL 来说非常快,但如果它在 AZ 内,我认为这是可能的。如果您不在那里使用 SSL,那么使用 APIGW 将在客户端和 CloudFront 之间引入安全的 TLS 连接,这当然会带来延迟损失。但通常这对于安全连接来说是值得的,因为延迟仅发生在初始建立时。
一旦完全建立连接,或者当 API 具有适度的持续音量时,我预计 APIGW 的平均延迟会显着下降。不过,在建立新连接时,您仍然会看到约 100 毫秒的延迟。
不幸的是,您描述的用例 (EC2 -> APIGW -> EC2) 现在不是很好。由于 APIGW 位于 CloudFront 之后,它针对全球的客户端进行了优化,但是当客户端位于 EC2 上时,您会看到额外的延迟。
编辑: 添加 Lambda 时你只看到一个小惩罚的原因是 APIGW 已经建立了许多到 Lambda 的连接,因为它是一个具有少量 IP 的单一端点。 APIGW 中的实际开销(与连接无关)应该类似于 Lambda 开销。
亚马逊对此表示支持:
With API Gateway it requires going from the client to API Gateway, which means leaving the VPC and going out to the internet, then back to your VPC to go to your other EC2 Instance, then back to API Gateway, which means leaving your VPC again and then back to your first EC2 instance.
So this additional latency is expected. The only way to lower the latency is to add in API Caching which is only going to be useful is if the content you are requesting is going to be static and not updating constantly. You will still see the longer latency when the item is removed from cache and needs to be fetched from the System, but it will lower most calls.
所以我想延迟是正常的,这很不幸,但希望这不是我们必须不断向前处理的问题。
可能不完全原来的问题要求什么,但我会添加关于CloudFront的评论。
根据我的经验,CloudFront 和 API 网关平均每个 HTTPS 请求都会增加至少 100 毫秒 - 甚至更多。
This is due to the fact that in order to secure your API call, API Gateway enforces SSL in all of its components. This means that if you are using SSL on your backend, that your first API call will have to negotiate 3 SSL handshakes:
- Client to CloudFront
- CloudFront to API Gateway
- API Gateway to your backend
It is not uncommon for these handshakes to take over 100 milliseconds, meaning that a single request to an inactive API could see over 300 milliseconds of additional overhead. Both CloudFront and API Gateway attempt to reuse connections, so over a large number of requests you’d expect to see that the overhead for each call would approach only the cost of the initial SSL handshake. Unfortunately, if you’re testing from a web browser and making a single call against an API not yet in production, you will likely not see this.
在同一讨论中,最终阐明了 "large number of requests" 应该是什么才能真正看到连接重用:
Additionally, when I meant large, I should have been slightly more precise in scale. 1000 requests from a single source may not see significant reuse, but APIs that are seeing that many per second from multiple sources would definitely expect to see the results I mentioned.
...
Unfortunately, while cannot give you an exact number, you will not see any significant connection reuse until you approach closer to 100 requests per second.
请记住,这是 2016 年中后期的帖子,应该已经有了一些改进。但根据我自己的经验,这种开销仍然存在,并且在 2000 rps 的简单 API 上执行负载测试仍然给我 2018 年超过 200 毫秒的额外延迟。
来源:https://forums.aws.amazon.com/thread.jspa?messageID=737224