std::memcpy 是否在不同的普通可复制类型之间未定义行为？

Question

我已经使用 std::memcpy 来规避 严格别名 很长时间了。

例如，检查 float，如 this:

float f = ...;
uint32_t i;
static_assert(sizeof(f)==sizeof(i));
std::memcpy(&i, &f, sizeof(i));
// use i to extract f's sign, exponent & significand

然而，这一次，我检查了标准，我没有找到任何可以验证这一点的东西。我只找到 this:

For any object (other than a potentially-overlapping subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std::byte ([cstddef.syn]).⁴⁰ If the content of that array is copied back into the object, the object shall subsequently hold its original value. [ Example:
#define N sizeof(T)
char buf[N];
T obj;                          // obj initialized to its original value
std::memcpy(buf, &obj, N);      // between these two calls to std::memcpy, obj might be modified
std::memcpy(&obj, buf, N);      // at this point, each subobject of obj of scalar type holds its original value
— end example ]

和this：

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a potentially-overlapping subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2,⁴¹ obj2 shall subsequently hold the same value as obj1. [ Example:
T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p
— end example ]

因此，std::memcpying a float to/from char[] 是允许的，std::memcpying 在相同的普通类型之间也是允许的。

我的第一个示例（和链接的答案）是否定义明确？或者检查 float 的正确方法是将其 std::memcpy 放入 unsigned char[] 缓冲区，并使用 shifts 和 ors 构建 uint32_t 从它?

注意：查看 std::memcpy 的保证可能无法回答这个问题。据我所知，我可以用一个简单的字节复制循环替换 std::memcpy，问题是一样的。

Answer 1

Is my first example (and the linked answer) well defined?

行为并非未定义（除非目标类型具有源类型不共享的陷阱表示^†），但整数的结果值是实现定义。标准不保证浮点数的表示方式，因此无法以可移植的方式从整数中提取尾数等 - 也就是说，如今使用系统将自己限制在 IEEE 754 并不会限制你。

便携性问题：

C++ 不保证 IEEE 754
不能保证浮点数的字节字节顺序与整数字节顺序相匹配。
（具有陷阱表示的系统^†）。

您可以使用std::numeric_limits::is_iec559来验证您关于表示的假设是否正确。

^† 虽然看起来 uint32_t 不能有陷阱（见评论）所以你不必担心。通过使用 uint32_t，您已经排除了向深奥系统的可移植性 - 符合标准的系统不需要定义该别名。

Answer 2

标准可能无法正确说明这是允许的，但几乎可以肯定，据我所知，所有实现都将此视为已定义的行为。

为了便于复制到实际的 char[N] 对象中，构成 f 对象的字节可以像访问 char[N] 一样进行访问。我相信这部分没有争议。

来自 char[N] 的表示 uint32_t 值的字节可以复制到 uint32_t 对象中。这部分我相信也是没有争议的。

我相信，同样无可争议的是，例如fwrite 可能已经在程序的一个运行中写入了字节，并且 fread 可能已经在另一个运行甚至完全是另一个程序中读回了它们。

由于最后一部分，我相信字节来自何处并不重要，只要它们形成某个 uint32_t 对象的有效表示即可。您可以循环遍历所有 float 值，对每个值使用 memcmp 直到获得您想要的表示，您知道它与 uint32_t 您将其解释为的值。您 甚至可以 在另一个程序中做到这一点，一个编译器从未见过的程序。那本来是有效的。

如果从实现的角度来看，您的代码与明确有效的代码没有区别，则您的代码必须被视为有效。

Answer 3

您的示例定义明确，没有违反严格的别名。 std::memcpy 明确指出：

Copies count bytes from the object pointed to by src to the object pointed to by dest. Both objects are reinterpreted as arrays of unsigned char.

该标准允许通过 (signed/unsigned) char* 或 std::byte 为任何类型添加别名，因此您的示例不会显示 UB。如果生成的整数具有任何值是另一个问题。

use i to extract f's sign, exponent & significand

但是，标准不保证这一点，因为 float 的值是实现定义的（在 IEEE 754 的情况下它会起作用）。

std::memcpy 是否在不同的普通可复制类型之间未定义行为？

Is std::memcpy between different trivially copyable types undefined behavior?

c++

strict-aliasing

undefined-behavior

language-lawyer

c++17