为什么分叉我的进程会导致文件被无限读取

Why does forking my process cause the file to be read infinitely

我已经将我的整个程序归结为一个简短的主程序,它复制了这个问题,所以请原谅我没有任何意义。

input.txt 是一个包含几行文本的文本文件。这个简化的程序应该打印这些行。但是,如果调用 fork,程序将进入无限循环,并一遍又一遍地打印文件的内容。

就我对 fork 的理解而言,我在此代码段中使用它的方式本质上是空操作。分叉,父进程等待子进程再继续,子进程立即被杀死。

#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

enum { MAX = 100 };

int main(){
    freopen("input.txt", "r", stdin);
    char s[MAX];

    int i = 0;
    char* ret = fgets(s, MAX, stdin);
    while (ret != NULL) {
        //Commenting out this region fixes the issue
        int status;
        pid_t pid = fork();
        if (pid == 0) {
            exit(0);
        } else {
            waitpid(pid, &status, 0);
        }
        //End region
        printf("%s", s);
        ret = fgets(s, MAX, stdin);
    }
}

编辑:进一步的调查只会让我的问题变得更加陌生。如果文件包含 <4 行空行或 <3 行文本,它不会中断。但是,如果超过这个数,就会无限循环。

Edit2:如果文件包含数字 3 行数字它将无限循环,但如果它包含 3 行单词则不会。

exit() 调用关闭所有打开的文件句柄。在 fork 之后,child 和 parent 具有相同的执行堆栈副本,包括 FileHandle 指针。当 child 退出时,它会关闭文件并重置指针。

  int main(){
        freopen("input.txt", "r", stdin);
        char s[MAX];
        prompt(s);
        int i = 0;
        char* ret = fgets(s, MAX, stdin);
        while (ret != NULL) {
            //Commenting out this region fixes the issue
            int status;
            pid_t pid = fork();   // At this point both processes has a copy of the filehandle
            if (pid == 0) {
                exit(0);          // At this point the child closes the filehandle
            } else {
                waitpid(pid, &status, 0);
            }
            //End region
            printf("%s", s);
            ret = fgets(s, MAX, stdin);
        }
    }

正如 /u/visibleman 指出的那样,子线程正在关闭文件并在主线程中搞乱。

我可以通过

检查程序是否处于终端模式来解决这个问题
!isatty(fileno(stdin))

如果标准输入已被重定向,那么它会在进行任何处理或分叉之前将所有内容读入链表。

我很惊讶有一个问题,但它似乎是 Linux 上的一个问题(我在 Ubuntu 16.04 LTS 运行ning 上测试了一个 VMWare Fusion VM在我的 Mac) 上——但这在我的 Mac 运行ning macOS 10.13.4 (High Sierra) 上不是问题,我不希望它在其他系统上成为问题Unix 的变体。

正如我在 中指出的:

There's an open file description and an open file descriptor behind each stream. When the process forks, the child has its own set of open file descriptors (and file streams), but each file descriptor in the child shares the open file description with the parent. IF (and that's a big 'if') the child process closing the file descriptors first did the equivalent of lseek(fd, 0, SEEK_SET), then that would also position the file descriptor for the parent process, and that could lead to an infinite loop. However, I've never heard of a library that does that seek; there's no reason to do it.

有关打开文件描述符和打开文件描述的详细信息,请参阅 POSIX open() and fork()

打开的文件描述符是进程私有的;打开的文件描述由初始 'open file' 操作创建的文件描述符的所有副本共享。打开文件描述的关键属性之一是当前查找位置。这意味着 child 进程可以更改 parent 的当前查找位置——因为它在共享打开文件描述中。

neof97.c

我使用了以下代码——原始代码的适度改编版本,使用严格的编译选项编译干净:

#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

enum { MAX = 100 };

int main(void)
{
    if (freopen("input.txt", "r", stdin) == 0)
        return 1;
    char s[MAX];
    for (int i = 0; i < 30 && fgets(s, MAX, stdin) != NULL; i++)
    {
        // Commenting out this region fixes the issue
        int status;
        pid_t pid = fork();
        if (pid == 0)
        {
            exit(0);
        }
        else
        {
            waitpid(pid, &status, 0);
        }
        // End region
        printf("%s", s);
    }
    return 0;
}

其中一项修改将循环数 (children) 限制为仅 30。 我使用的数据文件有 4 行,每行 20 个 运行dom 字母加上一个换行符(总共 84 个字节):

ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe

我运行 Ubuntu strace 下的命令:

$ strace -ff -o st-out -- neof97
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
…
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
$

有 31 个文件的名称格式为 st-out.808##,其中哈希值是 2 位数字。主进程文件很大;其他的很小,尺寸为 66、110、111 或 137 之一:

$ cat st-out.80833
lseek(0, -63, SEEK_CUR)                 = 21
exit_group(0)                           = ?
+++ exited with 0 +++
$ cat st-out.80834
lseek(0, -42, SEEK_CUR)                 = -1 EINVAL (Invalid argument)
exit_group(0)                           = ?
+++ exited with 0 +++
$ cat st-out.80835
lseek(0, -21, SEEK_CUR)                 = 0
exit_group(0)                           = ?
+++ exited with 0 +++
$ cat st-out.80836
exit_group(0)                           = ?
+++ exited with 0 +++
$

碰巧前 4 个 child 人每人都表现出四种行为中的一种 - 之后的每组 4 child 人都表现出相同的模式。

这表明四分之三的 children 在退出之前确实在标准输入上执行了 lseek()。显然,我现在已经看到图书馆这样做了。我不知道为什么它被认为是一个好主意,但根据经验,这就是正在发生的事情。

neof67.c

此版本的代码,使用单独的文件流(和文件描述符)和 fopen() 而不是 freopen() 也 运行 进入问题。

#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

enum { MAX = 100 };

int main(void)
{
    FILE *fp = fopen("input.txt", "r");
    if (fp == 0)
        return 1;
    char s[MAX];
    for (int i = 0; i < 30 && fgets(s, MAX, fp) != NULL; i++)
    {
        // Commenting out this region fixes the issue
        int status;
        pid_t pid = fork();
        if (pid == 0)
        {
            exit(0);
        }
        else
        {
            waitpid(pid, &status, 0);
        }
        // End region
        printf("%s", s);
    }
    return 0;
}

这也表现出相同的行为,除了发生查找的文件描述符是 3 而不是 0。所以,我的两个假设被推翻了——它与 freopen()stdin 有关;第二个测试代码都显示不正确。

初步诊断

IMO,这是一个错误。你应该不能运行进入这个问题。 它很可能是 Linux (GNU C) 库中的错误,而不是内核中的错误。它是由 child 进程中的 lseek() 引起的。目前还不清楚(因为我没有去看源代码)库在做什么或为什么。


GLIBC 错误 23151

GLIBC Bug 23151 - 具有未关闭文件的分叉进程在退出前执行 lseek 并可能导致 parent I/O.

中的无限循环

该错误创建于 2018-05-08 US/Pacific,并于 2018-05-09 关闭为无效。给出的原因是:

Please read http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05_01, especially this paragraph:

Note that after a fork(), two handles exist where one existed before. […]

POSIX

POSIX 的完整部分(除了 C 标准未涵盖的措辞)是这样的:

2.5.1 Interaction of File Descriptors and Standard I/O Streams

An open file description may be accessed through a file descriptor, which is created using functions such as open() or pipe(), or through a stream, which is created using functions such as fopen() or popen(). Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles.

Handles can be created or destroyed by explicit user action, without affecting the underlying open file description. Some of the ways to create them include fcntl(), dup(), fdopen(), fileno(), and fork(). They can be destroyed by at least fclose(), close(), and the exec functions.

A file descriptor that is never used in an operation that could affect the file offset (for example, read(), write(), or lseek()) is not considered a handle for this discussion, but could give rise to one (for example, as a consequence of fdopen(), dup(), or fork()). This exception does not include the file descriptor underlying a stream, whether created with fopen() or fdopen(), so long as it is not used directly by the application to affect the file offset. The read() and write() functions implicitly affect the file offset; lseek() explicitly affects it.

The result of function calls involving any one handle (the "active handle") is defined elsewhere in this volume of POSIX.1-2017, but if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.

A handle which is a stream is considered to be closed when either an fclose(), or freopen() with non-full(1) filename, is executed on it (for freopen() with a null filename, it is implementation-defined whether a new handle is created or the existing one reused), or when the process owning that stream terminates with exit(), abort(), or due to a signal. A file descriptor is closed by close(), _exit(), or the exec() functions when FD_CLOEXEC is set on that file descriptor.

(1) [原文如此] 使用 'non-full' 可能是 'non-null'.

For a handle to become the active handle, the application shall ensure that the actions below are performed between the last use of the handle (the current active handle) and the first use of the second handle (the future active handle). The second handle then becomes the active handle. All activity by the application affecting the file offset on the first handle shall be suspended until it again becomes the active file handle. (If a stream function has as an underlying function one that affects the file offset, the stream function shall be considered to affect the file offset.)

The handles need not be in the same process for these rules to apply.

Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec() functions or _exit() (not exit()), the handle is never accessed in that process.)

For the first handle, the first applicable condition below applies. After the actions required below are taken, if the handle is still open, the application can close it.

  • If it is a file descriptor, no action is required.

  • If the only further action to be performed on any handle to this open file descriptor is to close it, no action need be taken.

  • If it is a stream which is unbuffered, no action need be taken.

  • If it is a stream which is line buffered, and the last byte written to the stream was a <newline> (that is, as if a putc('\n') was the most recent operation on that stream), no action need be taken.

  • If it is a stream which is open for writing or appending (but not also open for reading), the application shall either perform an fflush(), or the stream shall be closed.

  • If the stream is open for reading and it is at the end of the file (feof() is true), no action need be taken.

  • If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.

For the second handle:

  • If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location.

If the active handle ceases to be accessible before the requirements on the first handle, above, have been met, the state of the open file description becomes undefined. This might occur during functions such as a fork() or _exit().

The exec() functions make inaccessible all streams that are open at the time they are called, independent of which streams or file descriptors may be available to the new process image.

When these rules are followed, regardless of the sequence of handles used, implementations shall ensure that an application, even one consisting of several processes, shall yield correct results: no data shall be lost or duplicated when writing, and all data shall be written in order, except as requested by seeks. It is implementation-defined whether, and under what conditions, all input is seen exactly once.

Each function that operates on a stream is said to have zero or more "underlying functions". This means that the stream function shares certain traits with the underlying functions, but does not require that there be any relation between the implementations of the stream function and its underlying functions.

训诂

读起来太难了!如果您不清楚打开文件描述符和打开文件描述之间的区别,请阅读 open()fork() 的规范(并且 dup()dup2()). The definitions for file descriptor and open file description 也相关,如果简洁。

在这个问题的代码上下文中(以及 ),我们有一个文件流句柄打开以供只读,它还没有遇到 EOF(所以 feof() 不会return 正确,即使读取位置在文件末尾)。

规范的关键部分之一是:应用程序应准备 fork() 就像它是活动句柄的更改一样。

这意味着 'first file handle' 概述的步骤是相关的,并且逐步执行它们,第一个适用的条件是最后一个:

  • If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.

如果您查看 fflush() 的定义,您会发现:

If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() shall cause any unwritten data for that stream to be written to the file, [CX] ⌦ and the last data modification and last file status change timestamps of the underlying file shall be marked for update.

For a stream open for reading with an underlying file description, if the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream, and any characters pushed back onto the stream by ungetc() or ungetwc() that have not subsequently been read from the stream shall be discarded (without further changing the file offset). ⌫

如果将 fflush() 应用于与 non-seekable 文件关联的输入流,会发生什么情况尚不完全清楚,但这不是我们的直接关注点。但是,如果您正在编写通用库代码,那么在对流执行 fflush() 之前,您可能需要知道底层文件描述符是否可查找。或者,使用 fflush(NULL) 让系统为所有 I/O 流做任何必要的事情,注意这将丢失任何 pushed-back 个字符(通过 ungetc() 等)。

strace 输出中显示的 lseek() 操作似乎正在实现 fflush() 语义,将打开文件描述的文件偏移量与流的文件位置相关联。

所以,对于这个问题的代码,似乎 fflush(stdin)fork() 之前是必要的,以确保一致性。不这样做会导致 未定义的行为 ('if this is not done, the result is undefined') — 例如无限循环。

将exit(0)替换为_exit(0),一切正常。这是一个古老的 unix 传统,如果你使用 stdio,你的分叉图像必须使用 _exit(),而不是 exit()。