正在阅读一个不是最近用 GCC 编写的未定义行为的成员吗?

Is reading a member that wasn't the most recently written in GCC undefined behavior?

工会的C++ reference has the following explanation,这个问题有趣的部分以粗体显示:

The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined, and it's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

现在,如果我使用 g++ -std=c++11 以下代码在 Linux Mint 18 上编译,我将得到以下输出(由 printf 语句旁边的注释给出):

#include <cstdio>
using namespace std;

union myUnion {
    int var1; // 32 bits
    long int var2; // 64 bits
    char var3; // 8 bits
}; // union size is 64 bits (size of largest member)

int main()
{
    myUnion a;
    a.var1 = 10;
    printf("a is %ld bits and has value %d\n",sizeof(a)*8,a.var1); // ...has value 10
    a.var2 = 123456789;
    printf("a is %ld bits and has value %ld\n",sizeof(a)*8,a.var2); // ...has value 123456789
    a.var3 = 'y';
    printf("a is %ld bits and has value %c\n",sizeof(a)*8,a.var3); // ...has value y
    printf("a is %ld bits and has value %ld\n",sizeof(a)*8,a.var2); //... has value 123456789, why???
    return 0;
}

return 0 之前的行中,阅读 a.var2 给出的不是 'y' 字符的 ASCII 十进制数(这是我所期望的,我是联合会的新手)但是最初定义它的值。基于以上引自 cppreference.com,我是否理解这是未定义的行为,因为它不是标准的,而是 GCC 的特定实现?

编辑

正如下面的优秀答案所指出的,我在 return 0 之前的 printf 语句之后的注释中犯了一个复制错误。正确的版本是:

 printf("a is %ld bits and has value %ld\n",sizeof(a)*8,a.var2); //... has value 123456889, why???

即7 变为 8,因为前 8 位被 'y' 字符的 ASCII 值覆盖,即 121(二进制为 0111 1001)。不过,我将保留它在上面代码中的原样,以与由此产生的精彩讨论保持一致。

未定义行为的有趣之处在于它与 "random" 行为截然不同。编译器在处理未定义的行为时会有一个他们决定使用的行为,并且往往每次都表现出相同的行为。

例证:IDEOne 对这段代码有自己的解释:http://ideone.com/HO5id6

a is 32 bits and has value 10
a is 32 bits and has value 123456789
a is 32 bits and has value y
a is 32 bits and has value 123456889

您可能会注意到那里发生了一些有趣的事情(抛开 IDEOne 的编译器 long int 是 32 位而不是 64 位这一事实)。它仍然显示第 4 行与第 2 行 similarly,但该值实际上略有变化。似乎发生的事情是 'y'char 值是在联合中设置的,但它没有改变任何其他位。当我将它切换到 long long int 而不是 long int.

时,我得到了类似的行为

您可能想检查一下,在您的示例中,第 4 行是否完全与之前相同。我有点怀疑事实是否如此。

无论如何,为了回答您的具体问题,TL;DR 是在 GCC 中,写入联合只会改变与您正在写入的特定成员关联的位,并且不能保证 alter/clear 所有 其他位。当然,就像任何与 UB 相关的东西一样,不要假设任何其他编译器(甚至同一编译器的更高版本!)的行为相同。

您正在打印同一内存区域的部分内容:

myUnion a;
a.var2 = -1;
printf("a is %ld bits and has value %ld = 0x%lx\n",
    sizeof(a)*8, a.var2, a.var2);
a.var3 = 'y';
printf("a is %ld bits and has value %c = 0x%x\n",
    sizeof(a)*8, a.var3, a.var3);
printf("a is %ld bits and has value %ld = 0x%lx\n",
    sizeof(a)*8, a.var2, a.var2);

示例输出

a is 64 bits and has value -1 = 0xffffffffffffffff
a is 64 bits and has value y = 0x79
a is 64 bits and has value -135 = 0xffffffffffffff79

为了清楚起见,我已将您的 123456789 替换为最大值。这同样适用于您的号码:

a is 64 bits and has value 123456789 = 0x75bcd15
a is 64 bits and has value y = 0x79
a is 64 bits and has value 123456889 = 0x75bcd79

同样,原始值的第一个字节(特别是 0x15)被替换为 0x79y 字符),因此原始数字被修改。

显然,a.var2 被转换为整个内存区域的 long inta.var3 - 转换为 char,即联合内存的第一个字节。

可视化:

           long int (64)        = (long int) u
           ****************************************************************
           int (32) = a.var2    = (int) u
           ********************************
           char (1) = a.var1    = (char) u
           *
Byte no.:  0 ........................................................... 63
           ^
          ('y' = 0x79) (0xcd) (0x5b) (0x07)

文档中的行实际上意味着对联合成员的最后一次赋值指定了联合的值,其余的内存被认为是垃圾。虽然,我们通常可以观察到为union整体分配的内存中的剩余。

对于它的价值,C11 标准 §6.5.2.3,注释 95(第 83 页)说:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

这就是我所看到的,即使编译为 C++11(使用 Apple LLVM 版本 8.0.0 (clang-800.0.38)):

a is 64 bits and has value 10
a is 64 bits and has value 123456789
a is 64 bits and has value y
a is 64 bits and has value 123456889

注意最后一个值是不是123456789,而是123456889因为最低有效字节被

覆盖
a.var3 = 'y';

0x15 替换为 0x79 (== 'y')。

你确定你写的是什么?

在 GCC 5.4.0 的 ubuntu 64 位中,我得到:

a is 64 bits and has value 10
a is 64 bits and has value 123456789
a is 64 bits and has value y
a is 64 bits and has value 123456889

var2 是 64 位大小,通过更改 var3 的值,您正在修改 var2 的最后一个字节。使用 %x:

打印时会更清楚
a is 64 bits and var1 has value a
a is 64 bits and var2 has value 75bcd15
a is 64 bits and var3 has value 79
a is 64 bits and var2 has value 75bcd79

var1、var2 和 var3 具有相同的内存方向,并且由于您的体系结构对于大多数计算机 (Intel/Amd) 而言是 Little Endian,因此修改 var3 会更改 var2 和 var1 的低位字节,因为它们共享相同的内存地址.