强制转换为 "super-class" 并返回 C 是否违反严格别名?

Is it a violation of strict aliasing to cast to a "super-class" and back in C?

我想弄清楚是否在 C 中进行模拟子classing,其中超级class 结构被批发包含在子class 结构中,而不是只是 sub-class 和 super-class 具有相同的成员前缀。

在下面的示例代码中,我试图阐明我的想法:

#include "stdlib.h"
#include "stdio.h"
#include "string.h"

enum type {
    IS_A,
    IS_B,
};

struct header {
    enum type type;
};

struct a {
    struct header hdr;
    float x;
};

struct b {
    struct header hdr;
    int y;
};

void do_with_a(struct header *obj) {
    if (obj->type == IS_A) {
        struct a *a = (struct a *)obj;
        printf("%f\n", a->x);
    }
}

void do_with_b(struct header *obj) {
    // Oops forgot to check the type tag
    struct b *b = (struct b *)obj;
    printf("%d\n", b->y);
}

int main() {
    struct a *a = malloc(sizeof(*a));

    a->hdr.type = IS_A;
    a->x = 3.0;

    do_with_a(&a->hdr);
    do_with_b(&a->hdr);
}

我有理由相信 do_with_b() 如果使用“a”调用,将始终是未定义的行为。 我的主要问题do_with_a() 是否总是定义的行为(假设我已经正确设置了类型标签)或者当编译器作者更改他们的思想,或改进他们的分析。

作为子问题:我相信通过&ap->hdr(struct header *)apstruct a *转换为struct header *两者都定义明确,是这样吗?

看C11标准,好像有两段相关,一段在section 6.7.2.1 paragraph 15:

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member...

6.5 第 7 段中的一个:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • ...
  • an aggregate or union type that includes one of the aforementioned types among its members...

我不清楚这是否是对标准的预期解释,或者我是否过于乐观。


我试过在 GCC 和 clang 中编译的上述代码,并且在打开和关闭优化时它们的行为似乎都不一样。然而,当设置为 -Wstrict-aliasing=1 时,GCC 确实会发出关于两个向下转换的警告。语言有些含糊,说它“可能”破坏严格的别名,其中对该标志的描述表明误报很常见,所以这是不确定的:

undefined_test.c: In function ‘do_with_a’:
undefined_test.c:26:39: warning: dereferencing type-punned pointer might break strict-aliasing rules [-Wstrict-aliasing]
   26 |                 struct a *a = (struct a *)obj;
      |                                       ^
undefined_test.c: In function ‘do_with_b’:
undefined_test.c:33:31: warning: dereferencing type-punned pointer might break strict-aliasing rules [-Wstrict-aliasing]
   33 |         struct b *b = (struct b *)obj;
      |

相关问题:

采纳的答案的最后几乎回答了问题,但似乎并不令我满意。大多数关于它的评论似乎主要是指“公共前缀”的情况,而不是“嵌套结构”的情况。我不清楚“嵌套结构”案例是否已得到充分辩护。

  • My primary question is whether do_with_a() is always defined behavior

    As a sub-question: I believe that converting an struct a * to a struct header * by &ap->hdr or (struct header *)ap would both be well-defined, is this the case?

    根据初始成员规则 C17 6.7.2.1/15 强调我的 well-defined:

    Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

    这也与您在 6.5 §6 和 §7

    中引用的有效 type/strict 别名规则一致
  • I'm reasonably certain that do_with_b() will always be undefined behavior

    是的,它不是一个兼容的结构。所以这是一个严格的别名违规,也可能是一个对齐问题。但是请注意,严格的别名规则与称为“公共初始序列”的古怪规则兼容,在这种情况下,它允许您检查 b 的 header 部分。 C17 6.5.2.3/6:

    One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

    也就是说,如果你在翻译单元中添加类似 typedef union { struct a a_; struct b b_; } dummy; 的东西,你将被允许以 well-defined 的方式检查每个结构的 header 部分。但并不是说编译器在实现此功能时可能不符合标准,并且向委员会提交了一些关于它的缺陷报告(我不确定它在 C23 中的状态)。

  • GCC, however, does signal warnings about both down-casts when set to -Wstrict-aliasing=1

    gcc 中的这些选项的状态介于损坏和非常损坏之间。 -fno_strict_aliasing 然而,完全禁用它是可靠的。

    严格的别名规则本身有很多缺陷:例如你分配的内存的有效类型实际上是struct headerfloat,而不是struct a,因为你没有将 struct a 类型的左值写入 malloc 返回的内存。类似地,假设我们用 malloc 分配了一块内存,然后通过在 for 循环中初始化它来将其视为 type 的数组,那么我们实际上没有有效类型 type[] 但个人 objects。但如果这样实现,整个 C 语言就会崩溃。