如何在 Nginx 日志中查看不可用的服务器？

Question

Nginx 日志中的哪个位置会显示服务器不可用，因为它在 y 秒内失败了 x 次？

我在 nginx 的上游块中有一组服务器，每个服务器都有一个 fail_timeout 和 max_fails 值设置如下：

upstream loadbalancer {
    server ip1:80 max_fails=3 fail_timeout=60s;
    server ip2:80 max_fails=3 fail_timeout=60s;
}

如果我故意关闭其中一台服务器（假设 ip:80），NGINX 会返回一个 503，我已将其标记为无效 header。所以我确保 NGINX 在 60 秒内访问该服务器 3 次。

我希望日志中有一些东西表明服务器被标记为不可用，即 fail_timeout 已经启动。但我找不到任何东西。

这是我的日志配置：

access_log /var/log/nginx/access.log  main; 
error_log /var/log/nginx/error.log warn;

Answer 1

我不确定您是否可以获得不可用服务器的日志。但是你可以执行 lsof 命令，通过 PID 获取你的 httpd root 的额外列表日志文件。

1) 首先执行此命令以获取您的 HTTPD root 的 PID：

  > ps axu |grep httpd

2) 然后复制root的PID。假设 PID 是 1234。

3) 接下来我们使用 PID 1234 并执行最后的命令以获取 httpd root 的日志文件:

  > lsof -p 1234 |grep log

这对我查找丢失的日志帮助很大。现在您可以检查日志文件是否包含有关 不可用服务器 的任何内容。祝你好运

Answer 2

您应该会在错误日志中看到有关原因的有用信息。以下是 Nginx 1.8

中的一些示例

 [error] 9369#0: *837 connect() failed (111: Connection refused) while connecting to upstream

 [error] 9369#0: *851 connect() failed (113: No route to host) while connecting to upstream

 [error] 9369#0: *844 no live upstreams while connecting to upstream

如您所见，日志级别为 error，因此这在您的配置中不是问题。

您提到设置 503 header 以将主机标记为不可用。这不会在默认的 Nginx 设置中被检测到。要使用特定的响应代码来确定上游主机状态，请查看 proxy_next_upstream 选项。

将其设置为以下内容会在结果列表中包含 503 个响应代码，这被视为上游失败：

proxy_next_upstream error timeout http_503;

From the documentation:
The directive also defines what is considered an unsuccessful attempt of communication with a server. The cases of error, timeout and invalid_header are always considered unsuccessful attempts, even if they are not specified in the directive. The cases of http_500, http_502, http_503 and http_504 are considered unsuccessful attempts only if they are specified in the directive. The cases of http_403 and http_404 are never considered unsuccessful attempts

Answer 3

当服务器超过 max_fails 时，现在有一条日志消息。它已在 1.9.1 中添加。日志级别为警告，消息显示 "upstream server temporarily disabled".

如何在 Nginx 日志中查看不可用的服务器？

How can I see unavailable servers in Nginx logs?

logging

nginx

health-monitoring