使用公共初始序列初始化两个结构的并集

Initializing union of two structs with common initial sequnce

问题:如果 union 包含两个具有兼容类型的公共初始序列的结构,那么如果我们使用一个结构初始化初始序列的某些部分,其余的使用另一个结构的初始序列部分?

考虑以下代码片段:

union u_t{
    struct {
        int i1;
        int i2;
    } s1;

    struct {
        int j1;
        int j2;
    } s2;
};

int main(){
    union u_t *u_ptr = malloc(sizeof(*u_ptr));
    u_ptr -> s1.i1 = 10;
    u_ptr -> s2.j2 = 11;

    printf("%d\n", u_ptr -> s2.j1 + u_ptr -> s1.i2); //prints 21
}

DEMO

问题是 "printing 21" 行为是否定义明确。标准 N1570 6.5.2.3(p6) 指定以下内容:

if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.

因此可以检查公共初始序列(在本例中为整个结构)。但问题是,在这种情况下,联合似乎包含 s2 对象,其中 j2 是唯一初始化的成员。

我认为我们最终会出现 未指定的 行为,因为我们只初始化了 s2.j2s2.j1 没有它应该包含未指定的值。

C11 标准 (n1570) 在 [6.5 Expressions]/6footnote 中声明:

Allocated objects have no declared type.

并且 [6.5 Expressions]/6 指出:

6 The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

当您在 printf 语句中访问用于打印的存储值时,您也遵循了 [6.5 Expressions]/7 中规定的规则。

这与您从 N1570 6.5.2.3(p6) 提供的引文相结合,其中提供 "One special guarantee is made in order to simplify the use of unions" 使得这个定义明确。

在实践方面,如果您查看 assembly generated,您会发现这就是实际发生的事情。

        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     eax, 8
        mov     edi, eax
        call    malloc
        mov     qword ptr [rbp - 8], rax //Here
        mov     rax, qword ptr [rbp - 8] //Here
        mov     dword ptr [rax], 10      //Here 
        mov     rax, qword ptr [rbp - 8] //Here
        mov     dword ptr [rax + 4], 11  //Here 
        mov     rax, qword ptr [rbp - 8]
        mov     ecx, dword ptr [rax]
        mov     rax, qword ptr [rbp - 8]
        add     ecx, dword ptr [rax + 4]
        movabs  rdi, offset .L.str
        mov     esi, ecx
        mov     al, 0
        call    printf
        xor     ecx, ecx
        mov     dword ptr [rbp - 12], eax # 4-byte Spill
        mov     eax, ecx
        add     rsp, 16
        pop     rbp
        ret
.L.str:
        .asciz  "%d\n"

关于别名:

公共初始序列只关心两种结构类型的别名。这在这里不是问题,您的两个结构甚至是兼容的类型,因此指向它们的指针可能会在不使用任何技巧的情况下产生别名。剖析 C11 6.2.7:

6.2.7 Compatible type and composite type
Two types have compatible type if their types are the same. /--/ Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements:

If one is declared with a tag, the other shall be declared with the same tag.

两个结构都没有在此处用标记声明。

If both are completed anywhere within their respective translation units, then the following additional requirements apply:

它们都已完成(定义)。

there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types;

这适用于这些结构。

if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name.

对齐说明符不适用。

For two structures, corresponding members shall be declared in the same order.

这是正确的。

结论是你的两个结构都是兼容的类型。这意味着您不需要像通用初始序列这样的任何技巧。严格的别名规则只是声明 (6.5/7):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,

这里是这样的。

此外,如其他答案所述,此处实际数据的有效类型为int,因为分配的存储不会产生有效类型,因此它成为第一个用于左值访问的类型。这也意味着编译器不能假设指针不会别名。

此外,严格的别名规则为结构和联合成员的左值访问提供了一个例外:

an aggregate or union type that includes one of the aforementioned types among its members

然后你在上面有通用的初始序列。就别名而言,这是尽可能明确的定义。


关于类型双关语:

您真正关心的似乎不是别名,而是通过联合输入双关语。 C11 6.5.2.3/3 模糊地保证了这一点:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue.

那是规范文本,写得很糟糕 - 没有人能理解 programs/compilers 基于此应该如何表现。内容丰富的脚注 95) 解释得很好:

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

在您的情况下,您触发了从一种结构类型到另一种兼容结构类型的类型转换。这是非常安全的,因为它们是完全相同的类型,对齐或陷阱的问题不适用。

请注意这里的 C++ 是不同的。