Linux 没有 PID 文件竞争条件的守护进程

Question

我已经完成了几次将程序运行作为 Linux 下的守护进程的工作。

在一个案例中，我刚刚使用了 daemon()。
还有一次，我编写了自己的守护程序代码 (based on something like this)，因为我想对 STDIN、STDOUT 等进行更复杂的重定向。
我还使用 Busybox start-stop-daemon 将 C# Mono 程序作为守护进程启动，并使用 -m 选项生成 PID 文件。

问题是，所有这些解决方案都存在创建PID文件的竞争条件，也就是说，PID文件是由程序由其后台进程写入的，在前台进程退出后的某个不确定时间。这是一个问题，例如在嵌入式 Linux 中，如果程序是由 initscript 启动的，那么最后会启动一个看门狗进程，它通过检查其 PID 文件来监视程序运行ning。在使用 start-stop-daemon 的 C# Mono 案例中，我遇到过这样的系统偶尔会在看门狗启动时重新启动，因为在看门狗进程开始监视时程序的 PID 文件尚未写入（令人惊讶因为在实际情况下这可能会发生。

如何在没有 PID 文件竞争条件的情况下对程序进行守护进程？也就是通过这样的方式来保证在前台进程退出的时候，PID文件被完整的创建并写入了有效的PID值。

请注意，使用 Linux 守护进程 fork-setsid-fork idiom (to prevent the daemon from acquiring a controlling tty) 会稍微有点困难，因为父进程无法轻易获得孙子进程的 PID。

Answer 1

我正在尝试以下代码。要点是：

第一个 fork 的父进程等待子进程退出。
第一个 fork 的子进程进行各种守护进程设置，然后进行第二个 fork。第二个 fork 的父进程（获取其子进程的 PID）将 PID 写入 PID 文件，然后退出。

所以用这个方法，前台进程不会退出，直到后台进程的PID被写入。

（注意 exit() 和 _exit() 之间的区别。这个想法是 exit() 执行正常关闭，这可以包括通过 C++ 析构函数或通过 C atexit() 函数。但是 _exit() 跳过任何一个。这允许后台进程保持 PID 文件打开和锁定（使用例如 flock()），这允许 "singleton" 守护进程。所以程序在调用这个函数之前，应该打开 PID 文件并 flock() 它。如果它是一个 C 程序，它应该注册一个 atexit() 函数来关闭和删除 PID 文件. 如果它是 C++ 程序，它应该使用 RAII 样式 class 创建 PID 文件并在退出时 close/delete 它。）

int daemon_with_pid(int pid_fd)
{
    int         fd;
    pid_t       pid;
    pid_t       pid_wait;
    int         stat;
    int         file_bytes;
    char        pidfile_buffer[32];

    pid = fork();
    if (pid < 0) {
        perror("daemon fork");
        exit(20);
    }
    if (pid > 0) {
        /* We are the parent.
         * Wait for child to exit. The child will do a second fork,
         * write the PID of the grandchild to the pidfile, then exit.
         * We wait for this to avoid race condition on pidfile writing.
         * I.e. when we exit, pidfile contents are guaranteed valid. */
        for (;;) {
            pid_wait = waitpid(pid, &stat, 0);
            if (pid_wait == -1 && errno == EINTR)
                continue;
            if (WIFSTOPPED(stat) || WIFCONTINUED(stat))
                continue;
            break;
        }
        if (WIFEXITED(stat)) {
            if (WEXITSTATUS(stat) != 0) {
                fprintf(stderr, "Error in child process\n");
                exit(WEXITSTATUS(stat));
            }
            _exit(0);
        }
        _exit(21);
    }

    /* We are the child. Set up for daemon and then do second fork. */
    /* Set current directory to / */
    chdir("/");

    /* Redirect STDIN, STDOUT, STDERR to /dev/null */
    fd = open("/dev/null", O_RDWR);
    if (fd < 0)
        _exit(22);
    stat = dup2(fd, STDIN_FILENO);
    if (stat < 0)
        _exit(23);
    stat = dup2(fd, STDOUT_FILENO);
    if (stat < 0)
        _exit(23);
    stat = dup2(fd, STDERR_FILENO);
    if (stat < 0)
        _exit(23);

    /* Start a new session for the daemon. */
    setsid();

    /* Do a second fork */
    pid = fork();
    if (pid < 0) {
        _exit(24);
    }
    if (pid > 0) {
        /* We are the parent in this second fork; child of the first fork.
         * Write the PID to the pidfile, then exit. */
        if (pid_fd >= 0) {
            file_bytes = snprintf(pidfile_buffer, sizeof(pidfile_buffer), "%d\n", pid);
            if (file_bytes <= 0)
                _exit(25);
            stat = ftruncate(pid_fd, 0);
            if (stat < 0)
                _exit(26);
            stat = lseek(pid_fd, 0, SEEK_SET);
            if (stat < 0)
                _exit(27);
            stat = write(pid_fd, pidfile_buffer, file_bytes);
            if (stat < file_bytes)
                _exit(28);
        }
        _exit(0);

    }
    /* We are the child of the second fork; grandchild of the first fork. */
    return 0;
}

Answer 2

正如您所发现的，管理自己的 pid 文件的守护进程本质上是活泼的。解决方案是不要守护进程，而是运行进程在前台，然后使用进程主管来管理它。例如 runit, supervisord, and systemd's support for "new-style daemons".

Linux 没有 PID 文件竞争条件的守护进程

Linux daemonize without PID file race condition

linux

daemon

pid

race-condition