单独编译代码中结构的二进制兼容性

Question

给定一个 CPU 架构，是否准确确定了结构的确切二进制形式？

例如，struct stat64 被 glibc 和 Linux 内核使用。我看到 glibc 在 sysdeps/unix/sysv/linux/x86/bits/stat.h 中将其定义为：

struct stat64 {
    __dev_t st_dev;      /* Device.  */
# ifdef __x86_64__
    __ino64_t st_ino;    /* File serial number.  */
    __nlink_t st_nlink;  /* Link count.  */
/* ... et cetera ... */
}

我的内核已经编译好了。现在，当我使用这个定义编译新代码时，它们具有二进制兼容性。这哪里有保障？我所知道的唯一保证是：

第一个元素的偏移量为 0
稍后声明的元素具有更高的偏移量

所以如果内核代码以完全相同的方式（在 C 代码中）声明 struct stat64，那么我知道二进制形式有：

st_dev @偏移量 0
st_ino @偏移至少 sizeof(__dev_t)

但我目前不知道有什么方法可以确定 st_ino 的偏移量。 Kernighan & Ritchie 给出了一个简单的例子

struct X {
  char c;
  int i;
}

在我的 x86-64 机器上，offsetof(struct X, i) == 4。也许有一些通用对齐规则可以确定每个 CPU 架构的结构的确切二进制形式？

Answer 1

Given a CPU architecture, is the exact binary form of a struct determined exactly?

不，结构的表示或布局（“二进制形式”）最终由 C 实现决定，而不是由 CPU 体系结构决定。大多数用于正常目的的 C 实现都遵循制造商 and/or 操作系统提供的建议。然而，可能存在这样的情况，例如，特定类型的特定对齐可能会提供稍微更好的性能但不是必需的，因此一个 C 实现可能选择要求该对齐而另一个则不需要，这可能导致不同的结果结构布局。

此外，C 实现可能是为特殊目的而设计的，例如提供与遗留代码的兼容性，在这种情况下，它可能会选择为另一种体系结构复制某些旧编译器的对齐方式，而不是使用所需的对齐方式由目标处理器。

但是，让我们考虑使用一个 C 实现的单独编译中的结构。然后 C 2018 6.2.7 1 说：

… Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths…

因此，如果两个结构在不同的翻译单元中声明相同，或者在该段落中允许有微小的变化，那么它们是兼容的，这实际上意味着它们具有相同的布局或表示。

从技术上讲，该段落仅适用于同一程序的不同翻译单元。 C 标准定义了一个程序的行为；它没有明确定义程序（或程序片段，例如内核扩展）与操作系统之间的交互，尽管在某种程度上您可能会将操作系统及其中的所有内容运行视为一个程序。但是，出于实际目的，它适用于使用该 C 实现编译的所有内容。

这意味着只要您使用与编译内核相同的 C 实现，相同声明的结构将具有相同的表示。

另一个考虑是我们可能使用不同的编译器来编译内核和编译程序。内核可能使用 Clang 编译，而用户更喜欢使用 GCC。在这种情况下，编译器需要记录他们的行为。 C 标准不保证兼容性，但编译器可以，如果他们选择的话，也许可以通过记录它们遵守特定的应用程序二进制接口 (ABI)。

另请注意，上面讨论的“C 实现”不仅是特定的编译器，而且是具有特定开关的特定编译器。各种开关可能会改变编译器的行为方式，从而导致有效地成为不同的 C 实现，例如开关以符合一个或另一个版本的 C 标准，影响结构是否打包的开关，影响整数类型大小的开关，以及依此类推

单独编译代码中结构的二进制兼容性

Binary compatibility of struct in separately compiled code

c

struct

binary-compatibility