来自 AWS 应用程序负载均衡器的随机 502 错误

Random 502 errors from AWS Application load balancer

我们从 ALB 收到随机 502 错误,我们的后端根本没有受到攻击,因为没有请求日志。 ALB 中没有任何内容仅记录 502 错误,但没有任何内容可用于调试。

h2 2020-03-26T14:30:52.495547Z app/path/tomytarget 10.111.11.111:50103 100.00.00.00:8080:8080 0.001 18.799 -1 502 - 1213 208 "POST https://mydomain:443/user/auth HTTP/2.0" "Name/3 CFNetwork/1121.2.2 Darwin/19.2.0" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:ap-southeast-1:0000000000:targetgroup/path/tomytarget "Root=someId" "mydomain.com" "arn:aws:acm:ap-southeast-1:0000000000:certificate/certificatedId" 0 2020-03-26T14:30:33.694000Z "forward" "-" "-" "100.00.00.00:8080" "-"

我们在 nodejs 和 express 中使用适当的路由启用健康检查后开始注意到它

app.get("/health-check", (req, res) => {
  res.status(200).end();
});

这是我们的 ALB 配置,我们正在使用与另一个 VPC 对等的 VPC

  ElasticLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      IpAddressType: ipv4
      Scheme: internet-facing
      SecurityGroups:
        - !Ref ELBSecurityGroup
      Subnets:
        - !Ref PublicSubnetA
        - !Ref PublicSubnetB
      Type: application

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Port: 8080
      Protocol: HTTP
      Targets:
        - Id: <some ip in the other VPC>
          AvailabilityZone: all
          Port: 8080
      TargetType: ip
      VpcId: !Ref VPC
      HealthCheckEnabled: true
      HealthCheckIntervalSeconds: 30
      HealthCheckPath: /health-check
      HealthCheckPort: 8080
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 3
      UnhealthyThresholdCount: 5

  Listener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      DefaultActions:
        - Type: forward
          TargetGroupArn: !Ref TargetGroup
      LoadBalancerArn: !Ref ElasticLoadBalancer
      Certificates:
        - CertificateArn: !Ref CertificateArn
      Port: 443
      Protocol: HTTPS

正如我所说,我们使用 VPC 对等和 HTTPS 以及来自 AWS 证书管理器的证书

编辑:

如果您使用的是 nodejs,一种解决方案可能如下

// AWS ALB keepAlive is set to 60 seconds, we need to increase the default KeepAlive timeout
// of our node server
server.keepAliveTimeout = 65000; // Ensure all inactive connections are terminated by the ALB, by setting this a few seconds higher than the ALB idle timeout
server.headersTimeout = 66000; // Ensure the headersTimeout is set higher than the keepAliveTimeout due to this nodejs regression bug: https://github.com/nodejs/node/issues/27363

我已经在nodejs中这样解决了

// AWS ALB keepAlive is set to 60 seconds, we need to increase the default KeepAlive timeout
// of our node server
server.keepAliveTimeout = 65000; // Ensure all inactive connections are terminated by the ALB, by setting this a few seconds higher than the ALB idle timeout
server.headersTimeout = 66000; // Ensure the headersTimeout is set higher than the keepAliveTimeout due to this nodejs regression bug: https://github.com/nodejs/node/issues/27363