signed/unsigned 别名规则是否按预期工作?

Has signed/unsigned aliasing rule ever worked as intended?

这是 C++17 形式的规则 ([basic.lval]/8),但它在其他标准中看起来很相似("lvalue" 而不是 "glvalue" C++98):

8 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

(8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object

规则听起来像 "You'll have UB, unless you do X",但这并不意味着如果你做了 X,你就不会像人们预期的那样得到 UB!确实,做X是有条件的还是无条件的UB,取决于标准的版本。

让我们看下面的代码:

int i = -1;
unsigned j = reinterpret_cast<unsigned&>(i);

这段代码的行为是什么?

C++98 和 C++11

[expr.reinterpret.cast]/10(C++11 中的/11)(重点是我的):

An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. That is, a reference cast reinterpret_cast(x) has the same effect as the conversion *reinterpret_cast(&x) with the built-in & and * operators. The result is an lvalue that refers to the same object as the source lvalue, but with a different type.

因此 reinterpret_cast<unsigned&>(i) 左值引用 int 对象 i,但具有 usigned 类型。初始化需要初始化表达式的值,这正式意味着将左值到右值转换应用于左值。

[conv.lval]/1:

An lvalue of a non-function, non-array type T can be converted to an rvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

我们 unsigned 类型的左值不引用 unsigned 类型的对象,这意味着行为未定义。

C++14 和 C++17

在这些标准中,情况有点复杂,但规则略有放宽。 [expr.reinterpret.cast]/11 表示相同:

The result refers to the same object as the source glvalue, but with the specified type.

已从 [conv.lval]/1 中删除关于 UB 的冒犯性措辞:

A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise, the type of the prvalue is T.

但是 L-to-R 转换读取的是哪个值? [conv.lval]/(2.6)(C++17 中的/(3.4))回答了这个问题:

… the value contained in the object indicated by the glvalue is the prvalue result

unsigned 左值 reinterpret_cast<unsigned&>(i) 表示 i int 对象的值为 -1 并且 L-to-R 转换产生的 prvalue 具有 unsigned类型。 [expr]/4 表示:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.

-1 绝对不在纯右值表达式的 unsigned 类型的可表示值范围内,因此行为未定义。

如果 i 对象包含 [0, INT_MAX] 范围内的值,则行为将被定义。

同样的推理也适用于通过 int 泛左值访问 unsigned 对象的情况。这是 C++98 和 C++11 中的 UB 以及 C++14 和 C++17 中的 UB,除非对象的值在 [0, INT_MAX] 范围内。

因此,与普遍认为此别名规则允许将对象重新解释为包含对应 signed/unsigned 类型的值的看法相反,它不允许这样做。对于[0,INT_MAX]范围内的值,有符号和无符号类型的对象具有相同的表示(有符号整数类型的非负值范围是对应的子范围无符号整数类型,相同值在两种类型中的表示是相同的”说[basic.fundamental]/3 in C++17)。很难将这种访问称为 "reinterpretation",更不用说这是 C++14 之前的无条件 UB。

那么规则 ([basic.lval]/(8.4)) 的目的是什么?

这是 defect report 2214 的主题,上面写着:

Section: 6.9.1 [basic.fundamental] Status: C++17 Submitter: Richard Smith Date: 2015-12-15

[Adopted at the February/March, 2017 meeting.]

According to 6.9.1 [basic.fundamental] paragraph 3,

The range of non-negative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same. (This is the wording in C++11 and C++14 versions, though the paragraph numbers may be different -- n.m.)

C11对应的写法是,

The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same.

C 的措辞可以说更清晰,但它失去了 C++ 措辞的含义,即有符号类型的符号位是相应无符号类型的值表示的一部分。

提议的决议(2017 年 1 月):

将6.9.1 [basic.fundamental]第3段修改如下:

...The standard and extended unsigned integer types are collectively called unsigned integer types. The range of non-negative values of a signed integer type is a subrange of the corresponding unsigned integer type, the representation of the same value in each of the two types is the same, and the value representation of each corresponding signed/unsigned type shall be the same. The standard signed integer types...

所以这显然是一直以来的意图。 C++17 刚刚修正了措辞。

C 和 C++ 标准从未打算允许将负值重新解释为无符号,反之亦然。在野外有几种带符号的整数表示形式(例如一个的补码、二进制的补码、符号和大小)并且标准不强制其中任何一个,因此它不能规定这种重新解释的效果。它们 可以 实现定义,但考虑到陷阱表示的可能性,这并没有真正的好处。 “实现定义的结果或陷阱”与“未定义”一样好。