静态内容上游超时(110:连接超时)?

upstream timed out (110: Connection timed out) for static content?

我遇到过这样的情况,其中两个网络服务器设置了 nginx 作为负载均衡器,并且它们本身就是后端。发行版是 Debian Wheezy。两台服务器上的配置相同(四核,32GB 内存)

TCP

#/etc/sysctl.conf
vm.swappiness=0
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_sack=1
net.ipv4.ip_local_port_range=2000 65535
net.ipv4.tcp_max_syn_backlog=65535
net.core.somaxconn=65535
net.ipv4.tcp_max_tw_buckets=2000000
net.core.netdev_max_backlog=65535
net.ipv4.tcp_rfc1337=1
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.core.rmem_default=8388608
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 16384 16777216
net.ipv4.tcp_congestion_control=cubic
net.ipv4.tcp_tw_reuse=1
fs.file-max=3000000

Nginx

#/etc/nginx/nginx.conf
user www-data www-data;
worker_processes 8;
worker_rlimit_nofile 300000;
pid /run/nginx.pid;

events {
        worker_connections 8192;
        use epoll;
        #multi_accept on;
}
http {
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 10;
        types_hash_max_size 2048;
        server_tokens off;

        open_file_cache max=200000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 5;
        open_file_cache_errors on;

        gzip on;
        gzip_vary on;
        gzip_proxied any;
        gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
        gzip_min_length 10240;
        gzip_disable "MSIE [1-6]\.";
}

server {
    listen <PUBLIC-IPv4>:8080 default_server;
    listen <PUBLIC-IPv6>:8080 default_server;
    listen 127.0.0.1:8080 default_server;
    listen [::1]:8080 default_server;
    server_name backend01.example.com;
    access_log /var/log/nginx/access upstream;
    error_log /var/log/nginx/error;

    root /var/www/project/web;
    index app.php;
    error_page 500 501 502 503 504 505 /50x.html;
    client_max_body_size 8m;

    location ~ /\. { return 403; }
    try_files $uri $uri/ /app.php?$query_string;
    location ~ ^/(config|app_dev|app)\.php(/|$) {
        include fastcgi_params;
        # fastcgi_split_path_info ^(.+\.php)(/.*)$;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass_header Authorization;
        fastcgi_buffers 16 16k;
        fastcgi_buffer_size 32k;
        fastcgi_param HTTPS on;
    }
}

upstream www {
    ip_hash;
    server [::1]:8080;
    server backend02:8080;
}

server {
    listen <LOADBALANCER-IPv4>:443 ssl spdy;
    server_name www.example.com;
    access_log /var/log/nginx/access main;
    error_log /var/log/nginx/error;

    ssl                  on;
    ssl_certificate      /etc/ssl/example.com.crt;
    ssl_certificate_key  /etc/ssl/example.com.key;
    ssl_protocols        TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_ciphers          ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;
    ssl_session_cache    shared:SSL:20m;
    ssl_session_timeout  10m;

    root /var/www/project/web;
    error_page 500 501 502 503 504 505 /50x.html;
    client_max_body_size 8m;

    location /00_templates { return 403; }
    location / {
        proxy_read_timeout 300;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_pass http://www;
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
        proxy_buffers 16 16k;
        proxy_buffer_size 32k;
    }
}

当使用

模拟来自 3 个客户端的连接时
ab -c 200 -n 40000 -q https://www.example.com/static/file.html

为什么我得到

upstream timed out (110: Connection timed out) while connecting to upstream

在 nginx 日志中?静态文件 600 个并发连接的上游超时!? 虽然 运行 ab 测试我可以在第一个后端节点上看到:

# netstat -tan | grep ':8080 ' | awk '{print }' | sort | uniq -c
      2 LISTEN
     55 SYN_SENT
  37346 TIME_WAIT

好吧,我不喜欢看说明书,但是回答我的问题:

nginx close upstream connection after request

解决了。那么问题是什么:我已经将上游配置为使用 keepalive 但 Nginx 文档建议在代理位置设置以下选项:

    proxy_http_version 1.1;
    proxy_set_header Connection "";

就是这样,后端的数千个 TIME_WAIT 连接消失了,现在只有大约 150 个,而不是 30-40k。

我的情况是 php-fpm 需要重新启动