通过 uint8_t 引用更新更宽整数的一部分

Question

#include <cstdio>
#include <cstdint>


struct Registers {
    Registers() : af(0),
    f(*(reinterpret_cast<uint8_t *>(&af))),
    a(*(reinterpret_cast<uint8_t *>(&af) + 1)) {

    }
    std::uint16_t af;
    std::uint8_t& f;
    std::uint8_t& a;  
};

int main() {
    Registers r;
    r.af = 0x00FF;
    r.a = 0xAA;
    std::printf("AF: %04X A: %02X F: %02X\n", r.af, r.a, r.f);
    return 0;
}

不管字节序问题，这是合法的 c++，还是它调用了某种类型的未定义行为？我认为这对于指针应该没问题并且不违反严格的别名，因为 uint8_t 是一个 char 类型，但我不确定这是否通过引用合法。

这似乎在打开大多数编译器标志的情况下工作正常，并且不会引发任何警告：

$ clang++ reg.cpp -O3 -fsanitize=undefined -fstrict-aliasing -Wall && ./a.out
AF: AAFF A: AA F: FF

Answer 1

正如您在问题中指出的那样，转换为“字节”类型这一事实几乎肯定会消除任何违反严格别名要求的问题。

然而，从严格的“语言律师”的角度来看，reinterpret_cast<uint8_t *>(&af) + 1 表达式可能调用未定义的行为——因为指针操作数是不是数组元素的地址，数组元素是唯一类型，标准明确定义了这种指针算法。

来自 this Draft C++17 Standard（大胆强调我的）：

8.5.6 Additive operators [expr.add]

…

⁴ When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.

然而，将 uint16_t 变量视为（两个）uint8_t 元素的数组是否合法和明确定义可能存在争论空间。

Notes/Discussion

经过一些非常有帮助的评论后，我现在（甚至更加）确信，即使在指针加法表达式中可能正式未定义行为，也没有任何情况 (至少，在任何健全的平台上）原始问题中提供的代码将无法按预期工作。

首先，正如 Chris Dodd 所指出的，上述标准草案的 §6.7（第 2 段）是这样的：

² For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (6.6.1) making up the object can be copied into an array of char, unsigned char, or std::byte. If the content of that array is copied back into the object, the object shall subsequently hold its original value.

这证实了 uint16_t 数据可以被处理 – 至少，就其内存布局而言 – 作为 [=12= 的 2 元素数组]数据。

其次，Language Lawyer指出，非数组类型的对象可以认为是单个元素的数组；此外，允许对此类对象的地址进行指针运算，以得出“一次性通过”假设元素的地址。来自 slightly later Draft Standard (§6.8.3) [basic.compund]:

^3.4 For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.

因此，结合以上两者，指针加法的结果所引用的假设的“一次通过最后”[=12=]元素将 是 uint16_t 数据对象的第二个字节。

通过 uint8_t 引用更新更宽整数的一部分

Updating part of a wider integer through a uint8_t reference

c++

casting

reference

cpu-registers

language-lawyer

8.5.6 Additive operators [expr.add]