AWS:ALB 立即响应或在 130 秒后响应

AWS: ALB responding either immediately or after 130 seconds

我正在使用 ALB 并尝试利用基于主机的路由概念在它后面放置多个主机(与目标组进行 1-1 映射)。

所以我有 5 个网址,每个网址都被转发到不同的 TG(因此在我的例子中就是这样),例如

https://path1.mydomain.com
https://path2.mydomain.com

(...等等)...

我注意到我得到了二进制行为,即 ALB 将几乎立即响应(即 < 1.sec)或在大约 130 秒内响应。

$ for X in `seq 60`; do curl -Ik -w "HTTPCode=%{http_code} TotalTime=%{time_total}\n" https://path1.mydomain.com -so /dev/null; done
HTTPCode=200 TotalTime=130.157
HTTPCode=200 TotalTime=131.053
HTTPCode=200 TotalTime=131.050
HTTPCode=200 TotalTime=0.485
HTTPCode=200 TotalTime=130.533
HTTPCode=200 TotalTime=0.467
HTTPCode=200 TotalTime=130.586
HTTPCode=200 TotalTime=0.477
HTTPCode=200 TotalTime=130.567

这适用于所有路径。

知道这种行为可能来自哪里吗?

这是响应头(我总是得到200,不管延迟如何)

$ curl -kI  https://path1.mydomain.com
HTTP/1.1 200 OK
Date: Wed, 05 Dec 2018 17:03:14 GMT
Content-Type: text/html
Content-Length: 1617
Connection: keep-alive
Server: nginx
Last-Modified: Thu, 19 Jul 2018 09:52:09 GMT
ETag: "5b505f49-651"
Cache-Control: no-cache
Accept-Ranges: bytes

edit_1:虽然我为 ALB 注册了 2 subnets/AZs,但我所有的实例都在同一个 AZ/subnet 中,因为重要的是什么;

edit_2:当直接访问 public 个实例的 IP 之一时:

$ for X in `seq 60`; do curl -Ik -w "HTTPCode=%{http_code} TotalTime=%{time_total}\n" http://18.9.48.141 -so /dev/null; done
HTTPCode=200 TotalTime=0.005
HTTPCode=200 TotalTime=0.007
HTTPCode=200 TotalTime=0.005
HTTPCode=200 TotalTime=0.007
HTTPCode=200 TotalTime=0.007
HTTPCode=200 TotalTime=0.005
HTTPCode=200 TotalTime=0.005
HTTPCode=200 TotalTime=0.005
HTTPCode=200 TotalTime=0.010
HTTPCode=200 TotalTime=0.005
HTTPCode=200 TotalTime=0.005
HTTPCode=200 TotalTime=0.008

edit_3不是 dns 解析问题,因为 dig 的 100 次迭代命令 returns 立即没有错误。

edit_4:这是curl命令的strace挂起的地方:

connect(4, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("51.53.132.130")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=4, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 199)  = 0 (Timeout)
poll([{fd=4, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0)    = 0 (Timeout)
poll([{fd=4, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)

edit_5:对 public IP 的一些 tcptraceroute 迭代 curl 命令挂起

$ for i in `seq 10`; do sudo tcptraceroute 51.53.132.130; done
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  2.109 ms  2.097 ms  2.230 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  1.964 ms  1.954 ms  1.942 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  2.148 ms  2.220 ms  2.208 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  2.227 ms  2.214 ms  2.200 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  2.181 ms  2.170 ms  2.159 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  2.157 ms  2.221 ms  2.207 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  2.228 ms  2.216 ms  2.203 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  1.810 ms  1.962 ms  1.961 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  1.695 ms  1.757 ms  1.852 ms
traceroute to 51.53.132.130 (51.53.132.130), 30 hops max, 60 byte packets
 1  * * *
 2  51.53.132.130 (51.53.132.130) <syn,ack>  2.202 ms  2.187 ms  2.173 ms

很明显,巨大的延迟是由应用程序负载均衡器引入的。

问题如下:

我已经为我的 ALB 分配了 2 个 AZ,而与其中之一对应的子网没有 0.0.0.0/0 --> IG 路由。