这种违反严格别名规则的行为是否会出现我预期的行为？

Question

我知道违反严格别名规则是 C 标准的未定义行为。请不要告诉我这是UB，没什么好说的

我想知道是否有编译器不会为以下代码提供预期的行为（由我在下面定义）。

假设 float 和 int 的大小为 4 个字节，并且是一台大端机器。

float f = 1234.567;  /* Any value here */
unsigned int u = *(unsigned int *)&f;

我预期的英语行为是 "get the four bytes where the float is stored and put them in an int as is"。在代码中是这样的（我认为这里没有 UB）：

float f = 1234.567;  /* Any value here */
unsigned char *p = (unsigned char *)&f;
unsigned int u = (p[0] << 24) | (p[1] << 16) | (p[2] << 8) | p[3];

我也欢迎实际和具体的例子来说明为什么除了符合标准的 UB 之外，编译器还会有我认为意外的行为。

Answer 1

在大多数编译器上，它会做你期望的事情直到优化器决定消除死代码或将赋值移动到 f。

这基本上不可能测试任何给定的编译器是否会总是做你期望的事情——它可能适用于一个特定的程序，但随后一个稍微不同的程序可能会失败.严格别名规则基本上只是告诉编译器实现者 "you can rearrange and eliminate these things fairly freely by assuming they never alias"。当执行会导致此代码失败的事情无用时，优化器可能不会，因此您不会看到问题。

最重要的是，谈论 "which compilers this will somtimes work on" 没有有用，因为如果某些看似无关的变化，它可能会在未来突然停止对其中任何一个的工作.

Answer 2

float f = 1234.567;  /* Any value here */
unsigned int u = *(unsigned int *)&f;

为什么这不能按预期工作的一些似是而非的原因是：

float 和 unsigned int 大小不一样。（我曾在 int 是 64 位而 float 是 32 位的系统上工作过。我也曾在 int 和 float 都是 64 位的系统上工作过，所以你假设复制了 4 个字节会失败。)
float 和 unsigned int 有不同的对齐要求。具体来说，如果 unsigned int 需要比 float 更严格的对齐方式，而 f 恰好是严格对齐的，那么读取 f 就好像它是 unsigned int 可能会做坏事事物。（如果 int 和 float 大小相同，这可能不太可能。）
编译器可能会识别出代码的行为是未定义的，例如优化赋值。（我没有这方面的具体例子。）

如果您想将 float 的表示复制到 unsigned int，memcpy() 更安全（我会首先检查它们是否具有相同的大小）。如果要检查 float 对象的表示，规范的方法是将其复制到 unsigned char 的数组中。引用 ISO C 标准（N1570 草案中的 6.2.6.1p4）：

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

Answer 3

您无缘无故地调用了未定义的行为。

Will this strict-aliasing rule violation have the behavior I expect?

没有。而且您不需要期望任何东西，因为您可以编写出更好看的代码。

这已经定义了您想要的行为：

union {
  float f;
  uint32_t i;
} ufi_t;
assert(sizeof(float) == sizeof(uint32_t);

ufi_t u = { 123.456 };
uint32_t i = u.i;

你可以把它分解出来，好的编译器不会为它生成代码:

inline uint32_t int_from_float(float f) {
  ufi_t u = { f };
  return u.i;
}

您还可以安全地从 (*float) 转换为 (*ufi_t)。所以：

float f = 123.456;
uint32_t i = ((ufi_t*)&f)->i;

注意：欢迎语言律师在最后一个问题上让我直截了当，但这就是我对 C9899:201x 6.5 等的看法。

这种违反严格别名规则的行为是否会出现我预期的行为？

Will this strict-aliasing rule violation have the behavior I expect?

c

strict-aliasing

undefined-behavior