Nginx 缓存大小不会超过 344GB

Nginx cache size not growing above 344GB

我在 Ubuntu 18 和 docker 图像 nginx:1.19.10-alpine.

上构建了 Nginx 缓存服务器

Ubuntu下面给出18个磁盘使用详情供参考

ubuntu@host_name:~$ df -h
Filesystem                  Size  Used Avail Use% Mounted on
udev                        126G     0  126G   0% /dev
tmpfs                        26G  1.4M   26G   1% /run
/dev/mapper/vg_system-root  193G  8.9G  176G   5% /
tmpfs                       126G     0  126G   0% /dev/shm
tmpfs                       5.0M     0  5.0M   0% /run/lock
tmpfs                       126G     0  126G   0% /sys/fs/cgroup
/dev/mapper/vg_data-srv      24T  369G   24T   2% /srv
/dev/sda1                   453M  364M   62M  86% /boot
overlay                     193G  8.9G  176G   5% /var/lib/docker/overlay2/64_characters_random/merged
tmpfs                        26G     0   26G   0% /run/user/1646269961

Docker 容器详细信息

ubuntu@host_name:~$ sudo docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED      STATUS       PORTS     NAMES
contnr_idxyz   nginx:1.19.10-alpine   "/docker-entrypoint.…"   5 days ago   Up 9 hours             contnr_name

Nginx配置参考

user@host-name:/srv/mytool/nginx/config$ cat proxy.conf
access_log          off;
root                /var/log/nginx;
open_log_file_cache max=100;

log_format mytoollogformat
    '$time_iso8601 $remote_addr $status $request_time $body_bytes_sent '
    '$upstream_cache_status "$request" "$http_user_agent"';

proxy_http_version 1.1;
client_max_body_size 10g;

# R/W timeout for origin server
proxy_read_timeout 15m;
proxy_send_timeout 15m;

# R/W timeout for clients
client_body_timeout   15m;
client_header_timeout 15m;
send_timeout          15m;

# TODO: ssl_stapling and ssl_ciphers
ssl_prefer_server_ciphers on;
ssl_session_cache         shared:SSL:10m;

proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-mytool-Cache-Host $scheme://$host;

proxy_redirect         off;

proxy_cache_path       /var/cache/nginx levels=1:2 keys_zone=mytool:10m max_size=22000g inactive=180d;
proxy_cache_key        $host$uri$is_args$args$slice_range;
proxy_set_header       Range $slice_range;
proxy_cache_valid      200 206 2y;
proxy_cache_revalidate on;

add_header X-Cache-Status $upstream_cache_status;

proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504 http_429;

已删除服务器块,因为它脱离了当前问题的上下文,也出于安全原因。

让我详细解释一下我的问题。我们有主服务器(proxy_passed 服务器),它有数百 TB 的静态资源文件。当我们设置缓存服务器时,它会填充缓存并很好地从缓存中提供文件。但随着时间的推移注意到缓存大小并没有增加到 344GB 以上

user@host_name:/srv/mytool/nginx$ sudo du -sh ./*
344G    ./cache
52K     ./config
1004K   ./log
user@host_name:/srv/mytool/nginx$

我写了一个脚本来下载大约 500GB 的文件。但它从未将缓存大小增加到 344GB 以上。
目前完成的实验
添加了 max_size=100000g(以及旧值 min_free=1000g)
修改 max_size=22000g(设置小于 /srv 大小的值,即 24TB)
删除 min_free=1000g(假设 min_free 以某种方式清除缓存)
修改
proxy_cache_valid 200 206 1h;proxy_cache_valid 200 206 2y;
对于上述所有实验,在更改配置后,我重新启动了 docker 容器和 运行 通过缓存服务器下载 500GB 文件的脚本。但是尽管缓存大小达到了 380 到 400 GB,但在一个小时内它突然下降到 344GB。

我不知道为什么缓存没有完全填满,即使我已经为 /srv 分配了 24TB

Nginx 有问题吗?我的意思是对于 Nginx 的免费版本可能会有任何限制。我应该使用 Nginx plus。或者配置有误。

任何猜测都会对我有所帮助。提前致谢

更新

What are the soft and hard limits for max open files on the cache server?

$ cat /proc/sys/fs/file-max
26375980
$ ulimit -Hn
1048576
$ ulimit -Sn
1024

have you set limits in your nginx conf using worker_rlimit_nofile?

目前没有worker_rlimit_nofile

的设置
/ # cat /etc/nginx/nginx.conf

user  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

Do you have anything in the error logs?

低于给出的 filtered/distinct 日志

$ cat /srv/mytool/nginx/log/error.log
2022/01/09 05:56:35 [warn] 22#22: *10 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/1/00/0000000001 while reading upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /dists/58.2.A.0.409/semc/binary-arm/Packages HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/Packages", host: "dev.mytool-region.mycompany.com"
2022/01/09 06:09:21 [warn] 22#22: *35 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000006, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "POST /masked_path/all HTTP/1.1", host: "dev.mytool-region.mycompany.com"
2022/01/09 08:19:01 [error] 22#22: *120 etag mismatch in slice response while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", subrequest: "/masked_path/xyz.zip", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/09 18:19:12 [warn] 22#22: *1566 upstream server temporarily disabled while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[masked_IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/10 01:23:20 [error] 22#22: *2920 etag mismatch in slice response while reading response header from upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", subrequest: "/masked_path/xyz.zip", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/21 07:43:47 [error] 36#36: *441913 upstream timed out (110: Operation timed out) while SSL handshaking to upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"
2022/01/21 07:46:17 [warn] 37#37: *442070 upstream server temporarily disabled while SSL handshaking to upstream, client: masked_IPv6_address, server: dev.mytool-region.mycompany.com, request: "GET /masked_path/xyz.zip HTTP/1.1", upstream: "https://[IPv6_address]:443/masked_path/xyz.zip", host: "dev.mytool-region.mycompany.com"

Total 25k rows same as below within 10 days of logs
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry 55a25a5037f198bbec6cd49100bb1b76, count:1
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry e996d5e104f405444a579cd491faf3a8, count:1
2022/01/11 05:36:58 [alert] 70#70: ignore long locked inactive cache entry 394517a8ed8e43949003b3f7538dc471, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 4f92d3a72f64b7bafdbb3f0b66d8e638, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry be41b259a3e8f9698e0976639883a423, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 1da19b571ea4bce1428251689f0a7c69, count:1
2022/01/11 05:37:08 [alert] 70#70: ignore long locked inactive cache entry 2a4cac0c28ea430e7eef3f808cf1e06f, count:1
2022/01/11 05:37:18 [alert] 70#70: ignore long locked inactive cache entry 53a826f6931cf0f16020bcae100af347, count:1

更新 2: 对 nginx:perl docker 容器进行了相同的尝试。它也没有工作,缓存大小观察到,即使它增长超过 392GB,但在几个小时内突然下降到 344GB。用于启动下面给出的容器的命令

sudo docker run \
--detach \
--restart unless-stopped \
--volume /srv/mytool/nginx/config:/etc/nginx/conf.d:ro \
--volume /srv/mytool/nginx/cache:/var/cache/nginx \
--volume /srv/mytool/nginx/log:/var/log/nginx \
nginx:perl

再次更新

避免docker容器nginx:1.19.10-alpine 并做了简单的 Nginx 配置,如下所示

sudo apt install nginx
systemctl status nginx

$ sudo ufw app list
Available applications:
  Nginx Full
  Nginx HTTP
  Nginx HTTPS
  OpenSSH
$ sudo ufw allow 'Nginx Full'
Rules updated
Rules updated (v6)

在/etc/nginx/sites-available

中修改default.conf
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=custom_cache:10m inactive=180d;

upstream origin_server {
    server dev.mytool-region.mycompany.com;
}
server {
        listen 80 default_server;
        listen [::]:80 default_server;

        server_name _;

        location / {
            include proxy_params;
            proxy_pass http://origin_server;
        }

        location ~ ^/(path1|path2|path3|path4)/ {
            slice       5m;
            proxy_cache custom_cache;

            proxy_pass http://origin_server;
            proxy_cache_valid 200 206 2y;
            add_header X-Proxy-Cache $upstream_cache_status;
        }
}

已下载约 ~500GB。它工作正常并且缓存按预期填充

ubuntu@host_name:/var/cache$ sudo du -sh ./*
128K    ./apparmor
82M     ./apt
4.8M    ./debconf
20K     ./ldconfig
1.2M    ./man
0       ./motd-news
518G    ./nginx
4.0K    ./pollinate
20K     ./snapd
ubuntu@host_name:/var/cache$

但是还是不知道具体原因或者我的配置有什么问题。审判还在进行中。
再试一次
使用旧配置(docker nginx:1.19.10-alpine 并将 proxy_cache_valid 200 206 2y; 移入

location ~ ^/(path1|path2|path3|path4)/ {

但这也行不通。

您可以尝试配置临时缓存目录

proxy_max_temp_file_size

我非常怀疑你的 backend-services 在 nginx 询问他们 'If-Modified-Since'

时回应 'YES'

和您的设置

proxy_cache_revalidate on;

按照预期的方式删除过时的缓存项。这就解释了为什么您的缓存可以增长到 500G 但稍后会减少到 344G

根据documentationmax_size参数是可选的。

not specifying a value allows the cache to grow to use all available disk space.

删除 max-size 可能会让缓存使用所有可用的 space。尝试编辑 proxy_cache_path 指令并将其删除,在您当前的 conf 中修改为:

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=mytool:10m inactive=180d;

nginx 缓存切片的问题是当您配置 5 MB 的缓存切片时。 它最终会在缓存目录中创建切片缓存文件。 可以缓存的文件数量与 keys_zone 大小成正比 keys_zone=mytool:10m
因为我有 10m(10 兆字节)的缓存键,所以它允许最多 71203 个文件。 文件说

In addition, all active keys and information about data are stored in a shared memory zone, whose name and size are configured by the keys_zone parameter. One megabyte zone can store about 8 thousand keys.

As part of commercial subscription, the shared memory zone also stores extended cache information, thus, it is required to specify a larger zone size for the same number of keys. For example, one megabyte zone can store about 4 thousand keys.

因此将 keys_zone 修改为更大的值 keys_zone=mytool:1000m 修复了问题。

您可以观察到 keys_zone=mytool:10m

的缓存文件数在 71203 之后没有增长
user@host_name:/srv/mytool/nginx$ sudo du -sh ./*
326G    ./cache
52K     ./config
406M    ./log
user@host_name:/srv/mytool/nginx$ sudo find cache/ -type f | wc -l
71203

但是通过允许 keys_zone=mytool:1000m

的缓存文件无缝计数,它的大小开始增长
user@host_name:/srv/mytool/nginx$ sudo du -sh ./*
518G    ./cache
52K     ./config
4.6M    ./log
user@host_name:/srv/mytool/nginx$ sudo find cache/ -type f | wc -l
107243