为什么 unsigned char 的默认初始化行为与其他数据类型不同?

Why does unsigned char have different default initialization behaviour than other data types?

我正在阅读有关默认初始化的 cppreference 页面,我注意到有一个部分说明了这些内容:

//UB
int x;
int y = x;        
   
//Defined and ok
unsigned char c;
unsigned char d = c;

unsigned char 的相同规则也适用于 std::byte。

我的问题是,如果您尝试在分配值之前使用值(如上例)而不是 unsigned char,为什么所有其他非 class 变量(int、bool、char 等)都会导致 UB ?为什么 unsigned char 特殊?

The page I am reading for reference

区别不在于初始化行为。未初始化的 int 的值是不确定的,默认初始化使其不确定。未初始化的 unsigned char 的值是不确定的,默认初始化使其不确定。那里没有区别。

不同之处在于,生成不确定值的行为 int - 或除异常 unsigned char 或 std::byte 之外的任何其他类型 - 是未定义的(除非该值被丢弃)。

当正确定义不确定值时,unsigned char(以及后来的 std::byte)的异常被添加到 C++14 中的语言中(尽管由于更改是缺陷解决方案,对我来说理解它适用于当时的官方标准,C++11)。

我找不到该设计选择的书面理由。这是定义的时间表(所有标准引述均来自草稿):

C89 - 1.6 DEFINITIONS OF TERMS

Undefined behavior --- behavior, upon use of ... indeterminately-valued objects


C89 - 3.5.7 Initialization - Semantics

... If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

任何类型都没有例外。在阅读 C++98 标准时,您会明白为什么 C 标准是相关的。

C++98 - [dcl.init]

... Otherwise, if no initializer is specified for an object, the object and its subobjects, if any, have an indeterminate initial value

没有定义不确定值的含义或使用它时会发生什么。预期的含义可能推测与C89相同,但未指定。

C99 - 3. Terms, definitions, and symbols - 3.17.2

3.17.2 indeterminate value

either an unspecified value or a trap representation

3.17.3 unspecified value

valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

NOTE An unspecified value cannot be a trap representation.


C99 - 6.2.6 Representations of types - 6.2.6.1 General

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. 41) Such a representation is called a trap representation.


C99 - J.2 Undefined behavior

The behavior is undefined in the following circumstances:

  • ...
  • The value of an object with automatic storage duration is used while it is indeterminate
  • A trap representation is read by an lvalue expression that does not have character type
  • A trap representation is produced by a side effect that modifies any part of the object using an lvalue expression that does not have character type
  • ...

C99引入了trap representation这个术语,使用时也有UB,就像不确定的值一样。字符类型(即 char、unsigned char 和 signed char)没有陷阱表示,可用于在没有 UB 的情况下对其他类型的陷阱表示进行操作。

C++ core language issue - 616. Definition of “indeterminate value”

The C++ Standard uses the phrase “indeterminate value” without defining it. C99 defines it as “either an unspecified value or a trap representation.” Should C++ follow suit?

Proposed resolution (October, 2012):

[dcl.init] paragraph 12 as follows:

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17 [expr.ass]). [Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2 [basic.start.init]. —end note] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of:
  • the second or third operand of a conditional expression (5.16 [expr.cond]),
  • the right operand of a comma (5.18 [expr.comma]),
  • the operand of a cast or conversion to an unsigned narrow character type (4.7 [conv.integral], 5.2.3 [expr.type.conv], 5.2.9 [expr.static.cast], 5.4 [expr.cast]), or
  • a discarded-value expression (Clause 5 [expr]),

then the result of the operation is an indeterminate value.

If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the right operand of a simple assignment operator (5.17 [expr.ass]) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.

If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

提议的更改被接受为缺陷解决方案,并进行了一些进一步的更改(问题 1213),但大部分保持不变(对于此问题的目的而言足够相似)。这就是似乎已将 unsigned char 的异常引入 C++ 的地方。据我所知,核心语言问题没有 public 关于异常原因的评论或注释。

在 C89 和 C99 下,未初始化的值可以有任何位模式。如果可寻址位置有 n 位,则 unsigned char 保证有 2ⁿ 个可能的值,因此每个可能的位模式都是有效值。然而,其他类型将在某些平台上以并非所有位模式都有效的方式存储。如果代码在存储的位模式不代表有效值时尝试读取对象,该标准没有对可能发生的情况施加任何要求,因此读取 unsigned char 以外类型的对象是否会产生一个问题未指定的值,或可能触发任意行为,将取决于实现的指定类型表示是否将有效值分配给所有可能的位模式。

C11 标准添加了一个附加条件,即即使是指定所有对象的实现,无论它们的地址是否被采用,都将始终以所有位模式都代表有效值的方式存储,也可以选择表现在如果尝试访问不是 unsigned char 的未初始化对象,则完全是任意方式,其地址已被占用。尽管没有为 C11 发布基本原理文档(与早期版本不同),但我认为这些变化源于对标准是否应该只描述 100% 可移植程序或更广泛的实用程序的行为缺乏共识。如果一个程序将在完全未指定的实现上 运行,那么除非 C11 标准指定的情况,否则将不可能知道读取未初始化对象的效果。如果它将在一个已知的实现上成为 运行,那么它将被处理,但是该实现决定处理它,无论标准是否强制执行该行为,因此不需要特别强制执行任何操作。不幸的是,Gratuitously "Clever" Compiler 的作者认为,当标准将一个动作描述为“不可移植或错误”时,它真正的意思是“不可移植,因此是错误的”,并排除了“不可移植”的可能性。可移植但在预期目标上是正确的”,尽管这样的概念直接与标准早期版本的已发布基本原理文档相矛盾。