nginx 反向 HTTP 代理 - 当所有上游都不可用时会有什么行为？

Question

在尝试测量和增加我们的 nginx 吞吐量时，我注意到可能我们的配置有问题，但我不确定如何测试它。

我们使用一个简单的上游配置，有点像这样：

upstream myapp1 {
    server srv1.example.com max_fails=1 fail_timeout=3s;
    server srv2.example.com max_fails=1 fail_timeout=3s;
    server srv3.example.com max_fails=1 fail_timeout=3s;
}

当我们的后端过载时，第一个上游可能会进入不可用状态，增加的负载可能会很快导致其他后端也出现故障，从而在 fail_timeout 设置的持续时间内没有可用的后端。

nginx 在这种情况下表现如何？它如何处理传入的客户端连接？我希望在 nginx 日志中看到哪些错误？

从OS/netstat 监控来看，nginx 似乎试图缓存这些传入连接，直到一个或多个后端returns 达到可用状态，此时....我不是当然。是否所有等待的连接都转储到第一个可用的后端，可能导致另一个超载服务，并重复失败循环？

在这种情况下，正确的行为是什么？可以（应该？）将 nginx 配置为在没有后端可用时简单地丢弃/503 任何传入连接吗？

更新： 经过进一步研究，nginx 似乎会根据各种设置来决定后端是否可用。忽略这些设置，有没有办法观察 nginx 的决定？也许是日志条目？有什么可以确认幕后发生的事情吗？

Answer 1

这种情况下没有 "correct" 行为，更多取决于您希望如何 handle/manage 加载以及您的设置。

记住error_page handles errors that are generated by Nginx, therefore if you would like to take an action based on your upstream's return status codes you will need proxy_intercept_errors，例如：

location / {
    proxy_pass http://myapp1;
    proxy_http_version 1.1; 
    proxy_redirect off;
    proxy_set_header Host $http_host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP $remote_addr;

    proxy_intercept_errors on;
    error_page 500 502 503 504  =200 /50x.html;
}

在这种情况下，行：

error_page 500 502 503 504  =200 /50x.html;

当你的上游return500,502时，会return一个状态码200并显示50x.html页面的内容, 503 或 504.

Answer 2

听起来您的架构问题可能比简单的 nginx front-end。

当然，监控您的 front-end 服务器的性能及其处理后端的方式很重要，但是，最好的办法是构建您的基础架构，以避免首先通过 front-end 重载。

上游场景失败的正常原因是系统重启，或物理基础设施失败，而不是 slashdot 流量高峰导致您的上游之一瘫痪，随后导致多米诺骨牌效应其余的上游也是如此。

（TBH，如果是标称峰值负载可能导致您的一个上游出现故障，那么不清楚是什么让您认为其他上游可能保持在线状态，而不管它们的哪种组合 nginx 将发送剩余的客户，前提是他们能够处理的容量大致相等。）

因此，在设计架构时，您需要确保有足够数量的上游服务器，其中任何一台宕机都不会导致其余服务器出现过载情况。这意味着每个人都必须有合理的储备能力，并且，如果适用，自己也能优雅地处理错误。

此外，在 front-end 开始实施故障保存始终是个好主意 — nginx 在完全 microservice-like 架构中提供 http://nginx.org/r/limit_conn and http://nginx.org/r/limit_req, which are there to ensure that an overload condition could be detected at the root. You can combine this with http://nginx.org/r/error_page to catch the errors (possibly using http://nginx.org/r/recursive_error_pages and/or http://nginx.org/r/proxy_intercept_errors, as applicable), and, depending on circumstances, provide either cached versions of your pages (see http://nginx.org/r/proxy_cache), or appropriate error-messages. There's really no limit to the amount of logic you can put into nginx even using the standard syntax and standard directives; it's possible, for example, to detect and handle the slashdot effect directly from within nginx。

至于 nginx，它已 tried-and-true 在最苛刻和 mission-critical 应用程序中 — http://nginx.org/r/upstream 非常清楚服务器选择是如何进行的：

By default, requests are distributed between the servers using a weighted round-robin balancing method. … If an error occurs during communication with a server, the request will be passed to the next server, and so on until all of the functioning servers will be tried. If a successful response could not be obtained from any of the servers, the client will receive the result of the communication with the last server.

如果没有记录这些条件，我会感到惊讶 http://nginx.org/r/error_log, especially depending on the level of logging that you specify. If you have a very big installation, you might also want to look into commercial monitoring solutions, like NGINX Amplify。

nginx 反向 HTTP 代理 - 当所有上游都不可用时会有什么行为？

nginx reverse HTTP proxy - what behavior when all upstreams are unavailable?

reverse-proxy

nginx

tcp-ip