fork 和(失败的)exec 后 C 文件指针发生变化
C file pointer changing after fork and (failed) exec
我制作的程序会生成 fork,我认为 child 不会影响 parent。
虽然我没有在 parent 中进行任何更改,但是文件指针已更改。
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
FILE *fp = fopen("sm.c", "r");
char buf[1000];
char *args[] = {"invailid_command", NULL};
fgets(buf, sizeof(buf), fp);
printf("I'm one %d %ld\n", getpid(), ftell(fp));
if (fork() == 0) {
execvp(args[0], args);
exit(EXIT_FAILURE);
}
wait(NULL);
printf("I'm two %d %ld\n", getpid(), ftell(fp));
}
这输出
I'm one 21500 20
I'm two 21500 -1
而且我想让文件指针在两次 printf
调用之间不变。
为什么文件指针会改变,即使 execvp
失败,我能否使文件指针不可改变?
我能够在 Ubuntu 16.04 上用 gcc 5.4.0 重现这个。这里的罪魁祸首是 exit
以及 child 进程的创建方式。
exit
的手册页说明如下:
The exit() function causes normal process termination and the value
of status & 0377 is returned to the parent (see wait(2)).
All functions registered with atexit(3) and on_exit(3) are
called, in the reverse order of their registration. (It is possible
for one of these functions to use atexit(3) or on_exit(3) to
register an additional function to be executed during exit processing;
the new registration is added to the front of the list of
functions that remain to be called.) If one of these functions does
not return (e.g., it calls _exit(2), or kills itself with a
signal), then none of the remaining functions is called, and further
exit processing (in particular, flushing of stdio(3) streams)
is abandoned. If a function has been registered multiple times
using atexit(3) or on_exit(3), then it is called as many times as
it was registered.
All open stdio(3) streams are flushed and closed. Files created by
tmpfile(3) are removed.
The C standard specifies two constants, EXIT_SUCCESS and
EXIT_FAILURE, that may be passed to exit() to indicate successful or
unsuccessful termination, respectively.
因此,当您在 child 中调用 exit
时,它会关闭由 fp
表示的 FILE
。
通常在创建 child 进程时,它会获得 parent 的文件描述符的副本。然而,在这种情况下,child 的内存似乎仍然物理指向 parent 的内存。因此,当 exit
关闭 FILE
时,它会影响 parent.
如果您将 child 改为调用 _exit
,它会关闭 child 的文件描述符,但设法不触及 FILE
object 并且 parent 中对 ftell
的第二次调用将会成功。无论如何,在 non-exec 编辑的 child 中使用 _exit
是一种很好的做法,因为它可以防止 atexit
处理程序在 child.
中被调用
感谢 Jonathan Leffler 为我们指明了正确的方向。
尽管您的程序在 CentOS 7 / GCC 4.8.5 / GLIBC 2.17 上没有对我产生相同的意外行为,但您观察到不同的行为是合理的。根据 POSIX(您依赖 fork
),您的程序的行为实际上是 undefined。以下是 the relevant section 的一些摘录(强调已添加):
An open file description may be accessed through a file descriptor,
which is created using functions such as open()
or pipe()
, or through
a stream, which is created using functions such as fopen()
or popen()
.
Either a file descriptor or a stream is called a "handle" on the open
file description to which it refers; an open file description may have
several handles.
[...]
The result of function calls involving any one handle (the "active
handle") is defined elsewhere in this volume of POSIX.1-2017, but if
two or more handles are used, and any one of them is a stream, the
application shall ensure that their actions are coordinated as
described below. If this is not done, the result is undefined.
[...]
For a handle to become the active handle, the application shall ensure
that the actions below are performed between the last use of the
handle (the current active handle) and the first use of the second
handle (the future active handle). The second handle then becomes the
active handle. [...]
The handles need not be in the same process for these rules to apply.
Note that after a fork()
, two handles exist where one existed before.
The application shall ensure that, if both handles can ever be
accessed, they are both in a state where the other could become the
active handle first. [Where subject to the preceding qualification, the] application shall prepare for a fork()
exactly as if it were a change of active handle. (If the only action
performed by one of the processes is one of the exec functions or
_exit()
(not exit()
), the handle is never accessed in that process.)
For the first handle, the first applicable condition below applies.
[An impressively long list of alternatives that do not apply to the OP's situation ...]
- If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of
seeking, the application shall either perform an
fflush()
, or the
stream shall be closed.
For the second handle:
- If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the
first handle, the application shall perform an
lseek()
or fseek()
(as
appropriate to the type of handle) to an appropriate location.
因此,为了让 OP 程序在 parent 和 child 中访问相同的流,POSIX 要求 parent fflush()
stdin
在分叉之前,child fseek()
在开始之后。然后,在等待 child 终止后,parent 必须 fseek()
流。然而,鉴于我们知道 child 的 exec 将失败,可以通过让 child 使用 _exit()
(不访问流) 而不是 exit()
.
遵守 POSIX 的规定会产生以下结果:
When these rules are followed, regardless of the sequence of handles
used, implementations shall ensure that an application, even one
consisting of several processes, shall yield correct results: no data
shall be lost or duplicated when writing, and all data shall be
written in order, except as requested by seeks.
然而值得注意的是,
It is
implementation-defined whether, and under what conditions, all input
is seen exactly once.
我理解仅仅听到您对程序行为的期望不符合相关标准可能有点不满意,但仅此而已。 parent 和 child 进程确实有一些相关的共享数据,其形式为常见的打开文件描述(它们有单独的关联句柄),这似乎很可能是 vehicle 表示意外的(和未定义的)行为,但是没有基础可以预测您看到的特定行为,也没有我看到的同一程序的不同行为。
我制作的程序会生成 fork,我认为 child 不会影响 parent。
虽然我没有在 parent 中进行任何更改,但是文件指针已更改。
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void) {
FILE *fp = fopen("sm.c", "r");
char buf[1000];
char *args[] = {"invailid_command", NULL};
fgets(buf, sizeof(buf), fp);
printf("I'm one %d %ld\n", getpid(), ftell(fp));
if (fork() == 0) {
execvp(args[0], args);
exit(EXIT_FAILURE);
}
wait(NULL);
printf("I'm two %d %ld\n", getpid(), ftell(fp));
}
这输出
I'm one 21500 20
I'm two 21500 -1
而且我想让文件指针在两次 printf
调用之间不变。
为什么文件指针会改变,即使 execvp
失败,我能否使文件指针不可改变?
我能够在 Ubuntu 16.04 上用 gcc 5.4.0 重现这个。这里的罪魁祸首是 exit
以及 child 进程的创建方式。
exit
的手册页说明如下:
The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).
All functions registered with atexit(3) and on_exit(3) are called, in the reverse order of their registration. (It is possible for one of these functions to use atexit(3) or on_exit(3) to register an additional function to be executed during exit processing; the new registration is added to the front of the list of functions that remain to be called.) If one of these functions does not return (e.g., it calls _exit(2), or kills itself with a signal), then none of the remaining functions is called, and further exit processing (in particular, flushing of stdio(3) streams) is abandoned. If a function has been registered multiple times using atexit(3) or on_exit(3), then it is called as many times as it was registered.
All open stdio(3) streams are flushed and closed. Files created by tmpfile(3) are removed.
The C standard specifies two constants, EXIT_SUCCESS and EXIT_FAILURE, that may be passed to exit() to indicate successful or unsuccessful termination, respectively.
因此,当您在 child 中调用 exit
时,它会关闭由 fp
表示的 FILE
。
通常在创建 child 进程时,它会获得 parent 的文件描述符的副本。然而,在这种情况下,child 的内存似乎仍然物理指向 parent 的内存。因此,当 exit
关闭 FILE
时,它会影响 parent.
如果您将 child 改为调用 _exit
,它会关闭 child 的文件描述符,但设法不触及 FILE
object 并且 parent 中对 ftell
的第二次调用将会成功。无论如何,在 non-exec 编辑的 child 中使用 _exit
是一种很好的做法,因为它可以防止 atexit
处理程序在 child.
感谢 Jonathan Leffler 为我们指明了正确的方向。
尽管您的程序在 CentOS 7 / GCC 4.8.5 / GLIBC 2.17 上没有对我产生相同的意外行为,但您观察到不同的行为是合理的。根据 POSIX(您依赖 fork
),您的程序的行为实际上是 undefined。以下是 the relevant section 的一些摘录(强调已添加):
An open file description may be accessed through a file descriptor, which is created using functions such as
open()
orpipe()
, or through a stream, which is created using functions such asfopen()
orpopen()
. Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles.[...]
The result of function calls involving any one handle (the "active handle") is defined elsewhere in this volume of POSIX.1-2017, but if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.
[...]
For a handle to become the active handle, the application shall ensure that the actions below are performed between the last use of the handle (the current active handle) and the first use of the second handle (the future active handle). The second handle then becomes the active handle. [...]
The handles need not be in the same process for these rules to apply.
Note that after a
fork()
, two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. [Where subject to the preceding qualification, the] application shall prepare for afork()
exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec functions or_exit()
(notexit()
), the handle is never accessed in that process.)For the first handle, the first applicable condition below applies. [An impressively long list of alternatives that do not apply to the OP's situation ...]
- If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an
fflush()
, or the stream shall be closed.For the second handle:
- If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an
lseek()
orfseek()
(as appropriate to the type of handle) to an appropriate location.
因此,为了让 OP 程序在 parent 和 child 中访问相同的流,POSIX 要求 parent fflush()
stdin
在分叉之前,child fseek()
在开始之后。然后,在等待 child 终止后,parent 必须 fseek()
流。然而,鉴于我们知道 child 的 exec 将失败,可以通过让 child 使用 _exit()
(不访问流) 而不是 exit()
.
遵守 POSIX 的规定会产生以下结果:
When these rules are followed, regardless of the sequence of handles used, implementations shall ensure that an application, even one consisting of several processes, shall yield correct results: no data shall be lost or duplicated when writing, and all data shall be written in order, except as requested by seeks.
然而值得注意的是,
It is implementation-defined whether, and under what conditions, all input is seen exactly once.
我理解仅仅听到您对程序行为的期望不符合相关标准可能有点不满意,但仅此而已。 parent 和 child 进程确实有一些相关的共享数据,其形式为常见的打开文件描述(它们有单独的关联句柄),这似乎很可能是 vehicle 表示意外的(和未定义的)行为,但是没有基础可以预测您看到的特定行为,也没有我看到的同一程序的不同行为。