以下 C 联合访问模式是否未定义行为?

Is the following C union access pattern undefined behavior?

以下不是现代 C 中的未定义行为:

union foo
{
    int i;
    float f;
};
union foo bar;
bar.f = 1.0f;
printf("%08x\n", bar.i);

并打印 1.0f 的十六进制表示。

但是以下是未定义的行为:

int x;
printf("%08x\n", x);

这个呢?

union xyzzy
{
    char c;
    int i;
};
union xyzzy plugh;

这应该是未定义的行为,因为没有写入 plugh 的成员。

printf("%08x\n", plugh.i);

但是这个呢。这是否是未定义的行为?

plugh.c = 'A';
printf("%08x\n", plugh.i);

现在大多数 C 编译器都有 sizeof(char) < sizeof(int)sizeof(int) 为 2 或 4。这意味着在这些情况下,最多 50% 或 25% 的 plugh.i 会已写入,但读取剩余字节将读取未初始化的数据,因此应该是未定义的行为。在此基础上,整个读取是否未定义行为?

Defect report 283: Accessing a non-current union member ("type punning") 涵盖了这一点并告诉我们如果存在陷阱表示则存在未定义的行为。

缺陷报告要求:

In the paragraph corresponding to 6.5.2.3#5, C89 contained this sentence:

With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.

与这句话相关的是这个脚注:

The "byte orders" for scalar types are invisible to isolated programs that do not indulge in type punning (for example, by assigning to one member of a union and inspecting the storage by accessing another member that is an appropriately sixed array of character type), but must be accounted for when conforming to externally imposed storage layouts.

C99 is 6.2.6.1#7中唯一对应的废话:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values, but the value of the union object shall not thereby become a trap representation.

不是很清楚C99字有相同的 含义为C89的话。

缺陷报告添加了以下脚注:

Attach a new footnote 78a to the words "named member" in 6.5.2.3#3:

78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

C11 6.2.6.1 General 告诉我们:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

从 6.2.6.1 §7 开始:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

因此,设置plugh.c后,plugh.i的值将未指定。

从脚注到 6.5.2.3 §3 :

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

这表明类型双关是特别允许的(正如您在问题中所断言的那样)。但它可能会导致陷阱表示,在这种情况下,根据 6.2.6.1 §5 读取值具有未定义的行为:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. 50) Such a representation is called a trap representation.

如果它不是陷阱表示,标准中似乎没有任何内容会导致这种未定义的行为,因为从 4 §3 开始,我们得到:

A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.3.

C11 §6.2.6.1 p7 说:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

因此,plugh.i 将是未指定的。

其他答案解决了当 plugh 未初始化且仅分配 plugh.c 时读取 plugh.i 是否会产生未定义行为的主要问题。简而言之:不,除非 plugh.i 的字节在读取时构成陷阱表示。

但是我想直接针对问题中的一个初步断言说:

Most C compilers nowadays will have sizeof(char) < sizeof(int), with sizeof(int) being either 2 or 4. That means that in these cases at most 50% or 25% of plugh.i will have been written to

这个问题似乎假设给 plugh.c 赋值将使 plugh 的那些不对应于 c 的字节不受干扰,但标准决不会支持那个提议。事实上,它明确否认任何此类保证,因为正如其他人所指出的:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

(C2011, 6.2.6.1/7;已强调)

虽然这不能保证这些字节所取的未指定值与赋值前的值不同,但它明确规定它们可能是。在某些实现中,它们经常会是完全合理的。例如,在仅支持字大小的内存写入或此类写入比字节大小的写入更有效的平台上,对 plugh.c 的赋值很可能是通过字大小的写入实现的,而无需首先加载plugh.i 的其他字节以保留它们的值。

如果有用的优化可能会导致程序执行的某些方面以与标准不一致的方式运行(例如,两次连续读取同一字节产生不一致的结果),标准通常会尝试描述此类情况可能会观察到效果,然后将此类情况分类为调用未定义行为。它并没有付出太多努力来确保其特征不会 "ensnare" 某些其行为显然应该可预测地处理的动作,因为它希望编译器编写者避免在这种情况下表现得过于迟钝。

不幸的是,在某些极端情况下,这种方法确实效果不佳。例如,考虑:

struct c8 { uint32_t u; unsigned char arr[4]; };
union uc { uint32_t u; struct c8 dat; } uuc1,uuc2;

void wowzo(void)
{
  union uc u;
  u.u = 123;
  uuc1 = u;
  uuc2 = u;
}

我认为很明显,标准不要求 uuc1.dat.arruuc2.dat.arr 中的字节包含任何特定值,并且允许编译器为四个字节中的每一个字节i==0..3,复制uuc1.dat.arr[i]uuc2.dat.arr[i],复制uuc2.dat.arr[i]uuc1.dat.arr[i],或者把uuc1.dat.arr[i]uuc2.dat.arr[i]都写成匹配值。我不认为标准是否打算要求编译器 select 这些行动方案之一,而不是简单地让这些字节保留它们碰巧保留的任何内容。

很明显,如果没有人观察到 uuc1.dat.arruuc2.dat.arr 的内容,代码应该具有 完全 定义的行为,并且没有任何迹象表明检查这些数组应该调用 UB。此外,u.dat.arr 的值可以在分配给 uuc1uuc2 之间改变的定义方式。这表明 uuc1.dat.arruuc2.dat.arr 应该包含匹配值。另一方面,对于某些类型的程序,将明显无意义的数据存储到 uuc1.dat.arr and/or uuc1.dat.arr 中几乎没有任何用处。我不认为标准的作者特别打算要求这样的存储,但是说字节采用 "Unspecified" 值使它们成为必要。我希望这种行为保证会被弃用,但我不知道有什么可以替代它。