Nginx 常量写入导致 CPU I/O 等待
Nginx constant writes causes CPU I/O wait
我 运行 nginx/1.20.1 在 G9 CentOS 7 机器上使用以下规格提供静态视频文件:
- CPU
32 核
- 32GB 内存
- 6TB 硬盘存储空间
Nginx 配置:
user root;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
# Load dynamic modules. See /usr/share/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;
worker_rlimit_nofile 30000;
events {
worker_connections 2024;
use epoll;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
directio 16m;
# output_buffers 2 32m;
# aio threads;
sendfile_max_chunk 512k;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 120;
types_hash_max_size 2048;
# allow the server to close connection on non responding client, this will free up memory
reset_timedout_connection on;
# request timed out -- default 60
client_body_timeout 60;
# if client stop responding, free up memory -- default 60
send_timeout 30;
include /etc/nginx/mime.types;
default_type application/octet-stream;
client_max_body_size 200m;
# Load modular configuration files from the /etc/nginx/conf.d directory.
# See http://nginx.org/en/docs/ngx_core_module.html#include
# for more information.
include /etc/nginx/conf.d/*.conf;
}
conf.d:
server{
listen 80;
server_name mydomain.com;
charset utf-8;
sendfile on;
tcp_nopush on;
fastcgi_read_timeout 600;
client_header_timeout 600;
client_body_timeout 600;
client_max_body_size 0;
access_log /var/log/nginx/static.access_log main;
error_log /var/log/nginx/static.error_log error;
location / {
proxy_pass http://localhost:7070;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
# prevent nginx from serving dotfiles (.htaccess, .svn, .git, etc.)
location ~ /\. {
deny all;
access_log off;
log_not_found off;
}
}
server {
set $base_path "/mypath";
set $news_video_path "/mypath2";
listen 7070;
server_name localhost;
location ~ /upload/videos/(.*) {
alias $news_video_path/;
}
location ~ /video/(.*) {
alias $base_path/video/;
}
access_log /var/log/nginx/localhost.access_log main;
error_log /var/log/nginx/localhost.error_log error;
}
问题是当 nginx 进程启动时,CPU 平均负载也会增加,直到达到 100% 的使用率。我使用 htop
查看哪个进程正在消耗 CPU 并且没有这样的进程。然后我前往我们的监控仪表板,发现是 I/O 等待导致高平均负载:[=20=]
然后使用iotop
查看哪个进程有IO等待时间:
奇怪的是,Nginx 工作进程的磁盘写入率很高。有时 Total DISK WRITE
达到 100MB/s 但 Actual Disk Write
没有相同的行为。我还应该提一下,我不使用 Nginx 缓存,因此这些写操作与缓存无关。禁用 Nginx 日志记录也无济于事。
如何调试?为什么 nginx 在磁盘上写入那么多数据?
首先创建 /var/cache/nginx
目录并为您的 nginx
系统用户提供完全 read/write 访问权限,然后在 nginx http {}
上下文中添加此指令:
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_zone:10m max_size=300m inactive=1d;
proxy_cache_key "$scheme$request_method$host$request_uri";
然后将这些添加到 server {}
上下文或 location {}
您希望从以下位置提供缓存:
proxy_cache my_zone;
proxy_cache_valid 200 1d;
proxy_cache_valid 404 302 1m;
proxy_cache_revalidate on;
proxy_cache_bypass $http_cache_control;
proxy_http_version 1.1;
add_header X-Cache-Status $upstream_cache_status;
add_header X-Proxy-Cache $upstream_cache_status;
没有测试,但你应该明白了并测试它。
问题是缺少 Nginx multi_accept
指令。由于我们提供的是视频文件,而且它们通常很大,如果 Nginx 正在向某些用户提供视频文件,则无法响应新连接。
将 multi_accept on
添加到 events
块解决了问题。
events {
worker_connections 1024;
multi_accept on;
use epoll;
}
我 运行 nginx/1.20.1 在 G9 CentOS 7 机器上使用以下规格提供静态视频文件:
- CPU 32 核
- 32GB 内存
- 6TB 硬盘存储空间
Nginx 配置:
user root;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
# Load dynamic modules. See /usr/share/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;
worker_rlimit_nofile 30000;
events {
worker_connections 2024;
use epoll;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
directio 16m;
# output_buffers 2 32m;
# aio threads;
sendfile_max_chunk 512k;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 120;
types_hash_max_size 2048;
# allow the server to close connection on non responding client, this will free up memory
reset_timedout_connection on;
# request timed out -- default 60
client_body_timeout 60;
# if client stop responding, free up memory -- default 60
send_timeout 30;
include /etc/nginx/mime.types;
default_type application/octet-stream;
client_max_body_size 200m;
# Load modular configuration files from the /etc/nginx/conf.d directory.
# See http://nginx.org/en/docs/ngx_core_module.html#include
# for more information.
include /etc/nginx/conf.d/*.conf;
}
conf.d:
server{
listen 80;
server_name mydomain.com;
charset utf-8;
sendfile on;
tcp_nopush on;
fastcgi_read_timeout 600;
client_header_timeout 600;
client_body_timeout 600;
client_max_body_size 0;
access_log /var/log/nginx/static.access_log main;
error_log /var/log/nginx/static.error_log error;
location / {
proxy_pass http://localhost:7070;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
# prevent nginx from serving dotfiles (.htaccess, .svn, .git, etc.)
location ~ /\. {
deny all;
access_log off;
log_not_found off;
}
}
server {
set $base_path "/mypath";
set $news_video_path "/mypath2";
listen 7070;
server_name localhost;
location ~ /upload/videos/(.*) {
alias $news_video_path/;
}
location ~ /video/(.*) {
alias $base_path/video/;
}
access_log /var/log/nginx/localhost.access_log main;
error_log /var/log/nginx/localhost.error_log error;
}
问题是当 nginx 进程启动时,CPU 平均负载也会增加,直到达到 100% 的使用率。我使用 htop
查看哪个进程正在消耗 CPU 并且没有这样的进程。然后我前往我们的监控仪表板,发现是 I/O 等待导致高平均负载:[=20=]
然后使用iotop
查看哪个进程有IO等待时间:
奇怪的是,Nginx 工作进程的磁盘写入率很高。有时 Total DISK WRITE
达到 100MB/s 但 Actual Disk Write
没有相同的行为。我还应该提一下,我不使用 Nginx 缓存,因此这些写操作与缓存无关。禁用 Nginx 日志记录也无济于事。
如何调试?为什么 nginx 在磁盘上写入那么多数据?
首先创建 /var/cache/nginx
目录并为您的 nginx
系统用户提供完全 read/write 访问权限,然后在 nginx http {}
上下文中添加此指令:
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_zone:10m max_size=300m inactive=1d;
proxy_cache_key "$scheme$request_method$host$request_uri";
然后将这些添加到 server {}
上下文或 location {}
您希望从以下位置提供缓存:
proxy_cache my_zone;
proxy_cache_valid 200 1d;
proxy_cache_valid 404 302 1m;
proxy_cache_revalidate on;
proxy_cache_bypass $http_cache_control;
proxy_http_version 1.1;
add_header X-Cache-Status $upstream_cache_status;
add_header X-Proxy-Cache $upstream_cache_status;
没有测试,但你应该明白了并测试它。
问题是缺少 Nginx multi_accept
指令。由于我们提供的是视频文件,而且它们通常很大,如果 Nginx 正在向某些用户提供视频文件,则无法响应新连接。
将 multi_accept on
添加到 events
块解决了问题。
events {
worker_connections 1024;
multi_accept on;
use epoll;
}