在调用 exec() 函数族时，argv 的 char 元素是否都必须是唯一的？

Question

我正在尝试编写一个小型实用程序，将其参数列表中继到执行的进程，但在构建新进程的参数列表时会重复一些传入参数。

下面是我想要做的事情的一个非常简化的版本，它只是将每个参数重复一次：

#include <stdlib.h>
#include <unistd.h>

#define PROG "ls"

int main(int argc, char* argv[] ) {

    int progArgCount = (argc-1)*2;
    char** execArgv = malloc(sizeof(char*)*(progArgCount+2)); // +2 for PROG and final 0
    execArgv[0] = PROG;
    for (int i = 0; i<progArgCount; ++i)
        execArgv[i+1] = argv[i/2+1];
    execArgv[progArgCount+1] = 0;

    execvp(PROG, execArgv );

} // end main()

请注意 execArgv 的元素不是唯一的。具体来说，每个重复中的两个元素是相同的，这意味着它们指向内存中的相同地址。

标准 C 是否对这种用法有任何说明？它是不正确的还是未定义的行为？如果不是，是否仍然不可取，因为执行程序可能取决于其 argv 元素的唯一性？如果我错了请纠正我，但是程序不能直接修改它们的 argv 元素吗，因为它们是非常量的？这不会造成执行程序愉快地修改 argv[1]（比方说）然后访问 argv[2] 的风险，错误地假设这两个元素指向独立的字符串吗？我很确定几年前我开始学习 C/C++ 时自己就这样做了，而且我认为当时我没有想到 argv 元素可能不是唯一的。

我知道执行涉及 "replacement of the process image"，但我不确定具体涉及什么。我可以想象它可能涉及将给定的 argv 参数（在我上面的示例中为 execArgv ）深度复制到新的内存分配，这可能会使事情变得单一，但我对内部结构了解不够exec函数的说法。这将是一种浪费，至少如果可以在 "replacement" 操作中保留原始数据结构，所以这是我怀疑它发生的原因。也许不同的 platforms/implementations 在这方面表现不同？回答者能否就此发言？

我试图找到关于这个问题的文档，但我只能从 http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html:

中找到以下内容

The arguments specified by a program with one of the exec functions shall be passed on to the new process image in the corresponding main() arguments.

上面没有说明它是否是传递给新进程的参数的唯一深层副本。

The argument argv is an array of character pointers to null-terminated strings. The application shall ensure that the last member of this array is a null pointer. These strings shall constitute the argument list available to the new process image. The value in argv[0] should point to a filename that is associated with the process being started by one of the exec functions.

同上。

The argv[] and envp[] arrays of pointers and the strings to which those arrays point shall not be modified by a call to one of the exec functions, except as a consequence of replacing the process image.

老实说，我不知道如何解释以上内容。 "Replacing the process image" 是整个exec函数的重点！如果它要修改数组或字符串，那么从某种意义上说，这将构成 "consequence of replacing the process image"。这几乎意味着 exec 函数将修改 argv。这段摘录只会加深我的困惑。

The statement about argv[] and envp[] being constants is included to make explicit to future writers of language bindings that these objects are completely constant. Due to a limitation of the ISO C standard, it is not possible to state that idea in standard C. Specifying two levels of const-qualification for the argv[] and envp[] parameters for the exec functions may seem to be the natural choice, given that these functions do not modify either the array of pointers or the characters to which the function points, but this would disallow existing correct code. Instead, only the array of pointers is noted as constant. The table of assignment compatibility for dst= src derived from the ISO C standard summarizes the compatibility:

不清楚"The statement about argv[] and envp[] being constants"指的是什么；我的主要理论是，它指的是文档页面顶部给出的原型中参数的常量限定。但由于这些限定符仅标记指针，而不标记 char 数据，因此很难显式 "that these objects are completely constant"。其次，不知道为什么段子说"writers of language bindings"；绑定到什么？这与 exec 函数的一般文档页面有什么关系？第三，该段的主旨似乎只是在说，为了向后兼容，我们坚持将 argv 元素指向的字符串的实际 char 内容保留为非常量与既定的 ISO C 标准和符合它的 "existing correct code"。文档页面上的 table 证实了这一点，我不会在这里引用。 None 果断地回答了我的主要问题，尽管它在摘录的中间相当清楚地说明了 exec 函数本身不会以任何方式修改给定的 argv 对象。

我将非常感谢有关我的主要问题的信息以及对我对引用文档摘录的解释和理解的评论（特别是，如果我的解释在任何方面都错误 ).谢谢！

Answer 1

你的 post 中隐藏了很多问题，所以我只会解决其中最重要的部分 (IMO)：

Does Standard C say anything about this usage? Is it incorrect, or undefined behavior?

如果 "standard C" 是指 POSIX，那么您已经找到了 exec* 的规范。如果它不强制要求参数必须不同，那么它们就不需要不同。

正如@SomeProgrammerDude 在评论中指出的那样，在字符串文字的情况下，很可能会得到不明显的字符串，因为编译器可以自由地对它们进行重复数据删除（例如 execl("foo", "bar", "foo")）。

is it still inadvisable, since the exec'd program might depend on the uniqueness of its argv elements?

C 标准本身不要求 argv 中的不同字符串，因此不能指望它们是不同的。

The above doesn't clarify if it is a uniquified deepcopy of the arguments

我们可以肯定地说，必须以某种方式进行复制，否则就有可能修改字符串文字（这是不允许的）。

但是，如何实现这一点的细节似乎留作实现选择。所以最好不要依赖任何特定的行为。

Answer 2

POSIX manual 中没有任何地方规定 argv 中的参数必须是唯一的。参数必须是空终止字符串，并且有一个空指针作为可变参数的最后一个参数：

The arguments represented by arg0,... are pointers to null-terminated character strings. These strings shall constitute the argument list available to the new process image. The list is terminated by a null pointer. The argument arg0 should point to a filename string that is associated with the process being started by one of the exec functions.

The argument argv is an array of character pointers to null-terminated strings. The application shall ensure that the last member of this array is a null pointer. These strings shall constitute the argument list available to the new process image. The value in argv[0] should point to a filename string that is associated with the process being started by one of the exec functions.

这就是 POSIX 需要的 all。所以没有明确要求参数必须是唯一的。因此，如果一个实现要求参数是唯一的，那么这与标准冲突。因为标准功能不能强加未指定的要求或具有标准中未指定的效果。

"Replacing the process image" is the entire point of the exec functions! If it's going to modify the array or the strings, then that would constitute a "consequence of replacing the process image", in one sense or another. This almost implies that the exec functions will modify argv.

允许修改只有成功；否则，不会发生“替换图像”，因此没有“后果”。它本质上是为了防止在原始进程中失败的 exec 调用使 argv 和 envp 处于不可用状态。

exec 不能进行浅拷贝，因为它无法知道给定参数的存储持续时间。因此，即使是以下内容也应该没问题：

char *p = "argument";
execvp("cmd", (char *[]){"cmd", p, p + 2, (char*)0});

Answer 3

Does Standard C say anything about this usage? Is it incorrect, or undefined behavior?

两个指针指向同一个内存位置没有问题。这不是未定义的行为。

If not, is it still inadvisable, since the exec'd program might depend on the uniqueness of its argv elements?

POSIX 标准没有指定任何关于 argv 元素的唯一性。

Please correct me if I'm wrong, but isn't it possible for programs to modify their argv elements directly, since they're non-const?

来自C Standards#5.1.2.2.1p2

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

所以，答案是 - 是的，这是可能的。

Wouldn't that create a risk of the exec'd program blithely modifying argv[1] (say) and then accessing argv[2], falsely assuming that the two elements point to independent strings?

在计算中，exec 是操作系统的一项功能，它在现有进程的上下文中运行可执行文件，替换以前的可执行文件。

因此，当执行exec家族系统调用时，参数中给出的程序将被加载到调用者的地址space并覆盖那里的程序。因此，一旦指定的程序文件开始执行，调用者地址 space 中的原始程序就消失了，并被新程序和参数列表 argv 存储在新替换的地址 space.

POSIX 标准说：

The number of bytes available for the new process' combined argument and environment lists is {ARG_MAX}. It is implementation-defined whether null terminators, pointers, and/or any alignment bytes are included in this total.

和ARG_MAX：

{ARG_MAX} Maximum length of argument to the exec functions including environment data.

这意味着有一些 space 分配给新的进程参数，并且可以安全地假设参数字符串复制到那个 space。

I know that exec'ing involves "replacement of the process image", but I'm not sure what that entails exactly.

勾选 this。

And perhaps different platforms/implementations behave differently in this respect? Can answerers please speak to this?

实现可能因平台而异，但 Unix 的所有变体都必须遵循相同的 POSIX 标准以保持兼容性。所以，我相信所有平台上的行为都必须相同。

在调用 exec() 函数族时，argv 的 char 元素是否都必须是唯一的？

When calling the exec() family of functions, do the char elements of argv all have to be unique?

c

posix

exec

在调用 exec*() 函数族时，argv 的 char* 元素是否都必须是唯一的？

When calling the exec*() family of functions, do the char* elements of argv all have to be unique?

c

posix

exec

在调用 exec() 函数族时，argv 的 char 元素是否都必须是唯一的？

When calling the exec() family of functions, do the char elements of argv all have to be unique?