使用公共初始序列初始化两个结构的并集
Initializing union of two structs with common initial sequnce
问题:如果 union 包含两个具有兼容类型的公共初始序列的结构,那么如果我们使用一个结构初始化初始序列的某些部分,其余的使用另一个结构的初始序列部分?
考虑以下代码片段:
union u_t{
struct {
int i1;
int i2;
} s1;
struct {
int j1;
int j2;
} s2;
};
int main(){
union u_t *u_ptr = malloc(sizeof(*u_ptr));
u_ptr -> s1.i1 = 10;
u_ptr -> s2.j2 = 11;
printf("%d\n", u_ptr -> s2.j1 + u_ptr -> s1.i2); //prints 21
}
问题是 "printing 21" 行为是否定义明确。标准 N1570 6.5.2.3(p6)
指定以下内容:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible.
因此可以检查公共初始序列(在本例中为整个结构)。但问题是,在这种情况下,联合似乎包含 s2
对象,其中 j2
是唯一初始化的成员。
我认为我们最终会出现 未指定的 行为,因为我们只初始化了 s2.j2
而 s2.j1
没有它应该包含未指定的值。
C11 标准 (n1570) 在 [6.5 Expressions]/6
的 footnote 中声明:
Allocated objects have no declared type.
并且 [6.5 Expressions]/6
指出:
6 The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
当您在 printf
语句中访问用于打印的存储值时,您也遵循了 [6.5 Expressions]/7
中规定的规则。
这与您从 N1570 6.5.2.3(p6)
提供的引文相结合,其中提供 "One special guarantee is made in order to simplify the use of unions" 使得这个定义明确。
在实践方面,如果您查看 assembly generated,您会发现这就是实际发生的事情。
push rbp
mov rbp, rsp
sub rsp, 16
mov eax, 8
mov edi, eax
call malloc
mov qword ptr [rbp - 8], rax //Here
mov rax, qword ptr [rbp - 8] //Here
mov dword ptr [rax], 10 //Here
mov rax, qword ptr [rbp - 8] //Here
mov dword ptr [rax + 4], 11 //Here
mov rax, qword ptr [rbp - 8]
mov ecx, dword ptr [rax]
mov rax, qword ptr [rbp - 8]
add ecx, dword ptr [rax + 4]
movabs rdi, offset .L.str
mov esi, ecx
mov al, 0
call printf
xor ecx, ecx
mov dword ptr [rbp - 12], eax # 4-byte Spill
mov eax, ecx
add rsp, 16
pop rbp
ret
.L.str:
.asciz "%d\n"
关于别名:
公共初始序列只关心两种结构类型的别名。这在这里不是问题,您的两个结构甚至是兼容的类型,因此指向它们的指针可能会在不使用任何技巧的情况下产生别名。剖析 C11 6.2.7:
6.2.7 Compatible type and composite type
Two types have compatible type if their types are the same. /--/ Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements:
If one is declared with a tag, the
other shall be declared with the same tag.
两个结构都没有在此处用标记声明。
If both are completed anywhere within their
respective translation units, then the following additional requirements apply:
它们都已完成(定义)。
there shall
be a one-to-one correspondence between their members such that each pair of
corresponding members are declared with compatible types;
这适用于这些结构。
if one member of the pair is
declared with an alignment specifier, the other is declared with an equivalent alignment
specifier; and if one member of the pair is declared with a name, the other is declared
with the same name.
对齐说明符不适用。
For two structures, corresponding members shall be declared in the
same order.
这是正确的。
结论是你的两个结构都是兼容的类型。这意味着您不需要像通用初始序列这样的任何技巧。严格的别名规则只是声明 (6.5/7):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
这里是这样的。
此外,如其他答案所述,此处实际数据的有效类型为int
,因为分配的存储不会产生有效类型,因此它成为第一个用于左值访问的类型。这也意味着编译器不能假设指针不会别名。
此外,严格的别名规则为结构和联合成员的左值访问提供了一个例外:
an aggregate or union type that includes one of the aforementioned types among its
members
然后你在上面有通用的初始序列。就别名而言,这是尽可能明确的定义。
关于类型双关语:
您真正关心的似乎不是别名,而是通过联合输入双关语。 C11 6.5.2.3/3 模糊地保证了这一点:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue.
那是规范文本,写得很糟糕 - 没有人能理解 programs/compilers 基于此应该如何表现。内容丰富的脚注 95) 解释得很好:
95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
在您的情况下,您触发了从一种结构类型到另一种兼容结构类型的类型转换。这是非常安全的,因为它们是完全相同的类型,对齐或陷阱的问题不适用。
请注意这里的 C++ 是不同的。
问题:如果 union 包含两个具有兼容类型的公共初始序列的结构,那么如果我们使用一个结构初始化初始序列的某些部分,其余的使用另一个结构的初始序列部分?
考虑以下代码片段:
union u_t{
struct {
int i1;
int i2;
} s1;
struct {
int j1;
int j2;
} s2;
};
int main(){
union u_t *u_ptr = malloc(sizeof(*u_ptr));
u_ptr -> s1.i1 = 10;
u_ptr -> s2.j2 = 11;
printf("%d\n", u_ptr -> s2.j1 + u_ptr -> s1.i2); //prints 21
}
问题是 "printing 21" 行为是否定义明确。标准 N1570 6.5.2.3(p6)
指定以下内容:
if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.
因此可以检查公共初始序列(在本例中为整个结构)。但问题是,在这种情况下,联合似乎包含 s2
对象,其中 j2
是唯一初始化的成员。
我认为我们最终会出现 未指定的 行为,因为我们只初始化了 s2.j2
而 s2.j1
没有它应该包含未指定的值。
C11 标准 (n1570) 在 [6.5 Expressions]/6
的 footnote 中声明:
Allocated objects have no declared type.
并且 [6.5 Expressions]/6
指出:
6 The effective type of an object for an access to its stored value is the declared type of the object, if any. If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.
当您在 printf
语句中访问用于打印的存储值时,您也遵循了 [6.5 Expressions]/7
中规定的规则。
这与您从 N1570 6.5.2.3(p6)
提供的引文相结合,其中提供 "One special guarantee is made in order to simplify the use of unions" 使得这个定义明确。
在实践方面,如果您查看 assembly generated,您会发现这就是实际发生的事情。
push rbp mov rbp, rsp sub rsp, 16 mov eax, 8 mov edi, eax call malloc mov qword ptr [rbp - 8], rax //Here mov rax, qword ptr [rbp - 8] //Here mov dword ptr [rax], 10 //Here mov rax, qword ptr [rbp - 8] //Here mov dword ptr [rax + 4], 11 //Here mov rax, qword ptr [rbp - 8] mov ecx, dword ptr [rax] mov rax, qword ptr [rbp - 8] add ecx, dword ptr [rax + 4] movabs rdi, offset .L.str mov esi, ecx mov al, 0 call printf xor ecx, ecx mov dword ptr [rbp - 12], eax # 4-byte Spill mov eax, ecx add rsp, 16 pop rbp ret .L.str: .asciz "%d\n"
关于别名:
公共初始序列只关心两种结构类型的别名。这在这里不是问题,您的两个结构甚至是兼容的类型,因此指向它们的指针可能会在不使用任何技巧的情况下产生别名。剖析 C11 6.2.7:
6.2.7 Compatible type and composite type
Two types have compatible type if their types are the same. /--/ Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements:If one is declared with a tag, the other shall be declared with the same tag.
两个结构都没有在此处用标记声明。
If both are completed anywhere within their respective translation units, then the following additional requirements apply:
它们都已完成(定义)。
there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types;
这适用于这些结构。
if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name.
对齐说明符不适用。
For two structures, corresponding members shall be declared in the same order.
这是正确的。
结论是你的两个结构都是兼容的类型。这意味着您不需要像通用初始序列这样的任何技巧。严格的别名规则只是声明 (6.5/7):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
这里是这样的。
此外,如其他答案所述,此处实际数据的有效类型为int
,因为分配的存储不会产生有效类型,因此它成为第一个用于左值访问的类型。这也意味着编译器不能假设指针不会别名。
此外,严格的别名规则为结构和联合成员的左值访问提供了一个例外:
an aggregate or union type that includes one of the aforementioned types among its members
然后你在上面有通用的初始序列。就别名而言,这是尽可能明确的定义。
关于类型双关语:
您真正关心的似乎不是别名,而是通过联合输入双关语。 C11 6.5.2.3/3 模糊地保证了这一点:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue.
那是规范文本,写得很糟糕 - 没有人能理解 programs/compilers 基于此应该如何表现。内容丰富的脚注 95) 解释得很好:
95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
在您的情况下,您触发了从一种结构类型到另一种兼容结构类型的类型转换。这是非常安全的,因为它们是完全相同的类型,对齐或陷阱的问题不适用。
请注意这里的 C++ 是不同的。