C++ 中的越界和未定义行为
Out of the bounds in C++ and undefined behaviour
我知道在 C++ 中访问超出缓冲区边界是未定义的行为。
这是来自 cppreference 的示例:
int table[4] = {};
bool exists_in_table(int v)
{
// return true in one of the first 4 iterations or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return true;
}
return false;
}
但是,我在 c++ 标准中找不到相应的段落。
谁能指出标准中解释这种情况的具体段落?
这是未定义的行为。我们可以并列几段话来证明这一点。首先,我不会明确证明,table[4]
是 *(table + 4)
。我们只需要问自己指针值的属性 table + 4
以及它与间接运算符的要求有什么关系。
在指针上,我们有这段话:
[basic.compound]
3 Every value of pointer type is one of the following:
- a pointer to an object or function (the pointer is said to point to the object or function), or
- a pointer past the end of an object ([expr.add]), or
- the null pointer value for that type, or
- an invalid pointer value.
我们的指针是第二个项目符号的类型,而不是第一个。至于间接运算符:
[expr.unary.op]
1 The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T
”, the type of the result is “T
”.
我希望通过阅读本段可以清楚地看出该操作是为前一段中第一个项目符号描述的类别的指针定义的。
因此我们将操作应用于未定义其行为的指针值。结果是未定义的行为。
下标运算符是通过加法运算符定义的。数组衰减为指向这个相同表达式中第一个元素的指针,因此适用指针算术规则。间接运算符用于加法的假设结果。
[expr.sub]
A postfix expression followed by an expression in square brackets is a postfix expression.
One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.
The result is of type “T”.
The type “T” shall be a completely-defined object type.
The expression E1[E2]
is identical (by definition) to *((E1)+(E2))
, ...
如果数组索引比最后一个元素多一个,即 E2 > std::size(E1)
(示例程序中不是这种情况),假设的指针算法本身是未定义的。
[expr.add]
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
- If P evaluates to a null pointer value ... (does not apply)
- Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n. (does not apply when i-j > n)
- Otherwise, the behavior is undefined.
在 E2 == std::size(E1)
的情况下(这是示例的最后一次迭代的情况),加法的假设结果是指向数组后一个的指针,并指向数组存储的外部。假设的指针算法定义明确。
[basic.compound]
A value of a pointer type that is a pointer ... past the end of an object represents ... the first byte in memory after the end of the storage occupied by the object
访问是根据对象定义的。但是那里没有对象,甚至没有存储空间,因此没有行为的定义。
好的,在某些情况下可能在指向的内存地址中是一个不相关的对象。下面的注释说,指向末尾的指针不是指向共享地址的此类不相关对象的指针。我找不到是哪个规范规则导致的。
[Note 2: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type, even if the unrelated object is located at that address. ...
或者,我们可以看看间接运算符的定义:
[expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type ... and the result is an lvalue referring to the object ... to which the expression points. ...
存在矛盾,因为没有可以引用的对象
所以,总而言之:
int table[N] = {};
table[N] == 0; // UB, accessing non-existing object
table[N + 1]; // UB, [expr.add]
table + N; // OK, one past last element
table[N]; // ¯\_(ツ)_/¯ See CWG 232
我知道在 C++ 中访问超出缓冲区边界是未定义的行为。
这是来自 cppreference 的示例:
int table[4] = {};
bool exists_in_table(int v)
{
// return true in one of the first 4 iterations or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return true;
}
return false;
}
但是,我在 c++ 标准中找不到相应的段落。
谁能指出标准中解释这种情况的具体段落?
这是未定义的行为。我们可以并列几段话来证明这一点。首先,我不会明确证明,table[4]
是 *(table + 4)
。我们只需要问自己指针值的属性 table + 4
以及它与间接运算符的要求有什么关系。
在指针上,我们有这段话:
[basic.compound]
3 Every value of pointer type is one of the following:
- a pointer to an object or function (the pointer is said to point to the object or function), or
- a pointer past the end of an object ([expr.add]), or
- the null pointer value for that type, or
- an invalid pointer value.
我们的指针是第二个项目符号的类型,而不是第一个。至于间接运算符:
[expr.unary.op]
1 The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to
T
”, the type of the result is “T
”.
我希望通过阅读本段可以清楚地看出该操作是为前一段中第一个项目符号描述的类别的指针定义的。
因此我们将操作应用于未定义其行为的指针值。结果是未定义的行为。
下标运算符是通过加法运算符定义的。数组衰减为指向这个相同表达式中第一个元素的指针,因此适用指针算术规则。间接运算符用于加法的假设结果。
[expr.sub]
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type. The expression
E1[E2]
is identical (by definition) to*((E1)+(E2))
, ...
如果数组索引比最后一个元素多一个,即 E2 > std::size(E1)
(示例程序中不是这种情况),假设的指针算法本身是未定义的。
[expr.add]
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
- If P evaluates to a null pointer value ... (does not apply)
- Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n. (does not apply when i-j > n)
- Otherwise, the behavior is undefined.
在 E2 == std::size(E1)
的情况下(这是示例的最后一次迭代的情况),加法的假设结果是指向数组后一个的指针,并指向数组存储的外部。假设的指针算法定义明确。
[basic.compound]
A value of a pointer type that is a pointer ... past the end of an object represents ... the first byte in memory after the end of the storage occupied by the object
访问是根据对象定义的。但是那里没有对象,甚至没有存储空间,因此没有行为的定义。
好的,在某些情况下可能在指向的内存地址中是一个不相关的对象。下面的注释说,指向末尾的指针不是指向共享地址的此类不相关对象的指针。我找不到是哪个规范规则导致的。
[Note 2: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type, even if the unrelated object is located at that address. ...
或者,我们可以看看间接运算符的定义:
[expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type ... and the result is an lvalue referring to the object ... to which the expression points. ...
存在矛盾,因为没有可以引用的对象
所以,总而言之:
int table[N] = {};
table[N] == 0; // UB, accessing non-existing object
table[N + 1]; // UB, [expr.add]
table + N; // OK, one past last element
table[N]; // ¯\_(ツ)_/¯ See CWG 232