certbot 未正确停止 Nginx 服务,因此不会重新启动

Nginx service that is not properly stopped by the certbot and therefore does not restart

我的 vps 运行 debian 有点问题。它通过基于 nginx、gunicorn、django 的基础设施托管多个网站。 有问题的网站有一个由 let's encrypt 管理的 ssl 证书。

当let's encrypt想要续订证书时,我认为问题来了。

错误

出现错误时的系统日志:

Dec 12 00:01:46 vps465872 systemd[1]: Starting Certbot...
Dec 12 00:01:49 vps465872 systemd[1]: Stopping A high performance web server and a reverse proxy server...
Dec 12 00:01:49 vps465872 systemd[1]: Stopped A high performance web server and a reverse proxy server.
Dec 12 00:01:55 vps465872 certbot[600]: nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
Dec 12 00:01:56 vps465872 systemd[1]: Starting A high performance web server and a reverse proxy server...
Dec 12 00:01:56 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:56 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:57 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Dec 12 00:01:58 vps465872 nginx[658]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Dec 12 00:01:59 vps465872 nginx[658]: nginx: [emerg] still could not bind()
Dec 12 00:01:59 vps465872 systemd[1]: nginx.service: Control process exited, code=exited status=1
Dec 12 00:01:59 vps465872 systemd[1]: Failed to start A high performance web server and a reverse proxy server.
Dec 12 00:01:59 vps465872 systemd[1]: nginx.service: Unit entered failed state.
Dec 12 00:01:59 vps465872 systemd[1]: nginx.service: Failed with result 'exit-code'.
Dec 12 00:01:59 vps465872 certbot[600]: Hook command "service nginx start" returned error code 1
Dec 12 00:01:59 vps465872 certbot[600]: Error output from service:
Dec 12 00:01:59 vps465872 certbot[600]: Job for nginx.service failed because the control process exited with error code.
Dec 12 00:01:59 vps465872 certbot[600]: See "systemctl status nginx.service" and "journalctl -xe" for details.

复制

就这样吧。让我们手动重做这个过程。我杀死了 nginx 周围的一切:

ps -ef |grep nginx
kill -9 xxxx
kill -9 xxxx

我重新启动 nginx:

service nginx start

然后一切正常。

我做了一个 certbot 的干运行:

certbot renew --dry-run

现在出现错误:

Attempting to renew cert (xxx.fr) from /etc/letsencrypt/renewal/xxx.fr.conf produced an unexpected error: Problem binding to port 443: Could not bind to IPv4 or IPv6... Skipping.

调查

我查看/运行目录:文件nginx.pid不再存在。

另一方面,一点 ps -ef |grep nginx 告诉我进程仍在 运行ning,确实网站是在职的。因此,如果我执行 nginx 启动服务,它会向我输出地址冲突错误。

我在 Whosebug 上发现有人和我有同样的问题,但解决方案不起作用。但它给了我在哪里看的线索。 Certbot renew: nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)

所以我在寻找: 文件 /etc/letsencrypt/renewal/xxx.fr.conf 包含以下挂钩:

[renewalparams]
authenticator = standalone
installer = nginx
pre_hook = service nginx stop
post_hook = service nginx start

很好。我查看相关脚本 /etc/init.d/nginx : 一开始它通过

提取 pid
PID=$(cat /etc/nginx/nginx/nginx.conf | grep -Ev' ^\s*#' | awk' BEGIN { RS="[;{}]" } { if ( == "pid") print  }' | head -n1)

此命令运行良好。

停止:

stop_nginx() {
    start-stop-daemon --stop --quiet --retry=$STOP_SCHEDULE --pidfile $PID --name $NAME
    RETVAL="$?"
    sleep 1
    return "$RETVAL"
}

开始

start_nginx() {
    start-stop-daemon --start --quiet --pidfile $PID --exec $DAEMON --test > /dev/null \
        || return 1
    start-stop-daemon --start --quiet --pidfile $PID --exec $DAEMON -- \
        $DAEMON_OPTS 2>/dev/null \
        || return 2
}

看起来不错。此外,当服务与其 pid 配合良好时,启动和停止命令也能很好地工作。

结论

好吧,就是这样,我遇到了一个我不明白的问题。

我建议使用 webroot 模式而不是独立模式。要续订证书,它会在您的网络服务器根目录中创建一个“.well-known/acme-challenge/”。

好处是停机时间更少,因为您只需要通过 post_hook

重新启动 nginx 服务,而不是 'stop-wait-start'

希望这个替代解决方案对您有所帮助