C++ 中指针运算的 a+i 和 &a[i] 有什么区别?
What are the differences between a+i and &a[i] for pointer arithmetic in C++?
假设我们有:
char* a;
int i;
许多 C++ 介绍(如 this one) suggest that the rvalues a+i
and &a[i]
are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:
in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.
换句话说,"binding" 对空取消引用的引用对象会导致未定义的行为。基于 context of the above text, one infers that merely evaluating &a[i]
(within the offsetof
macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i]
causes undefined behavior in the case where a=null
and i=0
. This behavior is different from a+i
(at least in C++, in the a=null, i=0 case).
这导致至少有 2 个关于 a+i
和 &a[i]
之间差异的问题:
首先,导致这种行为差异的 a+i
和 &a[i]
之间的 潜在 语义差异是什么。是否可以根据任何一种一般原则来解释,而不仅仅是 "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? &a[i]
是否会生成对 a[i]
的内存访问?或者规范作者那天对空引用不满意?或者别的什么?
其次,除了a=null
和i=0
的情况之外,还有其他情况a+i
和&a[i]
表现不同吗? (可以包含在第一个问题中,具体取决于对它的回答。)
在 C++ 标准中,[expr.sub]/1 部分您可以阅读:
The expression E1[E2]
is identical (by definition) to *((E1)+(E2))
.
这意味着 &a[i]
与 &*(a+i)
完全相同。因此,您首先要取消引用 *
一个指针,然后再获取地址 &
。如果指针无效(即 nullptr
,但也超出范围),则为 UB。
a+i
基于指针算法。起初它看起来不那么危险,因为没有取消引用肯定是 UB。不过也可能是UB(见[expr.add]/4:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤
n; otherwise, the behavior is undefined. Likewise, the expression P -
J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j
≤ n; otherwise, the behavior is undefined.
所以,虽然这两个表达式背后的语义略有不同,但我想说最终结果是一样的。
TL;DR:a+i
和 &a[i]
都是格式正确的,并且当 a
是空指针且 i
是 0 时产生空指针,根据符合标准(的意图),并且所有编译器都同意。
a+i
显然符合最新标准草案的 [expr.add]/4:
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
- If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
- [...]
&a[i]
很棘手。 Per [expr.sub]/1, a[i]
is equivalent to *(a+i)
, thus &a[i]
is equivalent to &*(a+i)
. Now the standard is not quite clear about whether &*(a+i)
is well-formed when a+i
is a null pointer. But as @n.m. points out in , the intent as recorded in cwg 232 允许这种情况。
由于核心语言UB需要被常量表达式捕获([expr.const]/(4.6)),我们可以测试编译器是否认为这两个表达式是UB
这是演示,如果编译器认为 static_assert
中的常量表达式是 UB,或者如果他们认为结果不是 true
,那么他们必须产生诊断(错误或警告)按照标准:
(请注意,这使用单参数 static_assert 和 constexpr lambda,它们是 C++17 的特性,默认的 lambda 参数也是很新的)
static_assert(nullptr == [](char* a=nullptr, int i=0) {
return a+i;
}());
static_assert(nullptr == [](char* a=nullptr, int i=0) {
return &a[i];
}());
从 https://godbolt.org/z/hhsV4I 来看,似乎所有编译器在这种情况下都表现一致,根本不产生任何诊断(这让我有点惊讶)。
但是,这与 offset
的情况不同。 that question 中发布的实现明确创建了一个引用(这是回避用户定义 operator&
所必需的),因此受引用要求的约束。
假设我们有:
char* a;
int i;
许多 C++ 介绍(如 this one) suggest that the rvalues a+i
and &a[i]
are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:
in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.
换句话说,"binding" 对空取消引用的引用对象会导致未定义的行为。基于 context of the above text, one infers that merely evaluating &a[i]
(within the offsetof
macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i]
causes undefined behavior in the case where a=null
and i=0
. This behavior is different from a+i
(at least in C++, in the a=null, i=0 case).
这导致至少有 2 个关于 a+i
和 &a[i]
之间差异的问题:
首先,导致这种行为差异的 a+i
和 &a[i]
之间的 潜在 语义差异是什么。是否可以根据任何一种一般原则来解释,而不仅仅是 "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? &a[i]
是否会生成对 a[i]
的内存访问?或者规范作者那天对空引用不满意?或者别的什么?
其次,除了a=null
和i=0
的情况之外,还有其他情况a+i
和&a[i]
表现不同吗? (可以包含在第一个问题中,具体取决于对它的回答。)
在 C++ 标准中,[expr.sub]/1 部分您可以阅读:
The expression
E1[E2]
is identical (by definition) to*((E1)+(E2))
.
这意味着 &a[i]
与 &*(a+i)
完全相同。因此,您首先要取消引用 *
一个指针,然后再获取地址 &
。如果指针无效(即 nullptr
,但也超出范围),则为 UB。
a+i
基于指针算法。起初它看起来不那么危险,因为没有取消引用肯定是 UB。不过也可能是UB(见[expr.add]/4:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.
所以,虽然这两个表达式背后的语义略有不同,但我想说最终结果是一样的。
TL;DR:a+i
和 &a[i]
都是格式正确的,并且当 a
是空指针且 i
是 0 时产生空指针,根据符合标准(的意图),并且所有编译器都同意。
a+i
显然符合最新标准草案的 [expr.add]/4:
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
- If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
- [...]
&a[i]
很棘手。 Per [expr.sub]/1, a[i]
is equivalent to *(a+i)
, thus &a[i]
is equivalent to &*(a+i)
. Now the standard is not quite clear about whether &*(a+i)
is well-formed when a+i
is a null pointer. But as @n.m. points out in
由于核心语言UB需要被常量表达式捕获([expr.const]/(4.6)),我们可以测试编译器是否认为这两个表达式是UB
这是演示,如果编译器认为 static_assert
中的常量表达式是 UB,或者如果他们认为结果不是 true
,那么他们必须产生诊断(错误或警告)按照标准:
(请注意,这使用单参数 static_assert 和 constexpr lambda,它们是 C++17 的特性,默认的 lambda 参数也是很新的)
static_assert(nullptr == [](char* a=nullptr, int i=0) {
return a+i;
}());
static_assert(nullptr == [](char* a=nullptr, int i=0) {
return &a[i];
}());
从 https://godbolt.org/z/hhsV4I 来看,似乎所有编译器在这种情况下都表现一致,根本不产生任何诊断(这让我有点惊讶)。
但是,这与 offset
的情况不同。 that question 中发布的实现明确创建了一个引用(这是回避用户定义 operator&
所必需的),因此受引用要求的约束。