减去与同一数组未定义行为无关的两个指针的基本原理是什么？

Question

根据 C++ 草案 expr.add 当您减去相同类型但不属于同一数组的指针时，行为未定义（重点是我的）：

When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header ([support.types]).

If P and Q both evaluate to null pointer values, the result is 0. (5.2)

Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.

Otherwise, the behavior is undefined. [ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined. — end note ]

使此类行为未定义而不是实现定义的理由是什么？

Answer 1

首先请看评论中提到的为什么没有很好的定义。简明扼要地给出的答案是，在某些（现在过时的？）系统使用的分段内存模型中，任意指针算法是不可能的。

What is the rationale to make such behavior undefined instead of, for instance, implementation defined?

每当标准将某事指定为未定义行为时，通常可以将其指定为仅由实现定义。那么，为什么要将任何内容指定为未定义？

嗯，未定义的行为更宽松。特别是，允许假设没有未定义的行为，如果假设不正确，编译器可能会执行优化，这会破坏程序。因此，指定未定义行为的一个原因是优化。

让我们考虑将两个指针作为参数的函数fun(int* arr1, int* arr2)。这些指针可以指向同一个数组，也可以不指向。假设该函数遍历一个指向数组 (arr1 + n)，并且必须在每次迭代中将每个位置与另一个指针 ((arr1 + n) != arr2) 进行比较。例如确保指向的对象不被覆盖。

假设我们这样调用函数：fun(array1, array2)。编译器知道 (array1 + n) != array2，因为否则行为是未定义的。因此，如果函数调用被内联扩展，编译器可以删除冗余检查 (arr1 + n) != arr2，它始终为真。如果跨数组边界的指针算术被很好地（甚至实现）定义，那么 (array1 + n) == array2 可能对一些 n 是正确的，并且这种优化是不可能的 - 除非编译器可以证明 (array1 + n) != array2 n 的所有可能值都成立，这有时更难证明。

跨 class 成员的指针算法甚至可以在分段内存模型中实现。迭代子数组的边界也是如此。在某些用例中这些可能非常有用，但这些在技术上是 UB。

在这些情况下支持 UB 的一个论点是 UB 优化的更多可能性。您不一定需要同意这是一个充分的论据。

Answer 2

正如评论中的一些人所指出的，除非结果值具有某种意义或以某种方式可用，否则定义行为是没有意义的。

已对 C 语言进行了一项研究，以回答与指针出处相关的问题（并打算对 C 规范提出措辞更改。）其中一个问题是：

Can one make a usable offset between two separately allocated objects by inter-object subtraction (using either pointer or integer arithmetic), to make a usable pointer to the second by adding the offset to the first? (source)

该研究作者的结论发表在一篇题为：Exploring C Semantics and Pointer Provenance 的论文中，关于这个特定问题，答案是：

Inter-object pointer arithmetic The first example in this section relied on guessing (and then checking) the offset between two allocations. What if one instead calculates the offset, with pointer subtraction; should that let one move between objects, as below?
// pointer_offset_from_ptr_subtraction_global_xy.c
#include <stdio.h>
#include <string.h>
#include <stddef.h>

int x=1, y=2;
int main() {
    int *p = &x;
    int *q = &y;
    ptrdiff_t offset = q - p;
    int *r = p + offset;
    if (memcmp(&r, &q, sizeof(r)) == 0) {
        *r = 11; // is this free of UB?
        printf("y=%d *q=%d *r=%d\n",y,*q,*r);
    }
}
In ISO C11, the q-p is UB (as a pointer subtraction between pointers to different objects, which in some abstract-machine executions are not one-past-related). In a variant semantics that allows construction of more-than-one-past pointers, one would have to to choose whether the *r=11 access is UB or not. The basic provenance semantics will forbid it, because r will retain the provenance of the x allocation, but its address is not in bounds for that. This is probably the most desirable semantics: we have found very few example idioms that intentionally use inter-object pointer arithmetic, and the freedom that forbidding it gives to alias analysis and optimisation seems significant.

这项研究被 C++ 社区采纳、总结并发送给 WG21（C++ 标准委员会）以征求反馈。

Relevant point of the Summary:

Pointer difference is only defined for pointers with the same provenance and within the same array.

所以，他们决定暂时不定义它。

请注意，C++ 标准委员会中有一个研究组 SG12，用于研究 未定义行为和漏洞。该小组对标准中的漏洞案例和 undefined/unspecified 行为进行了系统审查，并推荐了一组连贯的更改来定义 and/or 指定行为。大家可以关注一下这个群的进展情况，看看目前未定义或未指定的行为是否会在未来发生变化。

Answer 3

更学术地说：指针不是数字。它们是指针。

确实，您系统上的指针是作为某种抽象内存（可能是虚拟的 per-process 内存 space).

但 C++ 不关心这些。 C++ 希望您将指针视为 post-its，作为指向特定对象的书签。数字地址值只是一个side-effect。对指针有意义的仅算法是通过对象数组向前和向后；没有其他哲学意义。

这可能看起来非常神秘和无用，但它实际上是经过深思熟虑和有用的。 C++ 不想将实现限制为向实用的 low-level 计算机属性注入更多意义，而它无法控制。而且，由于没有理由这样做（你为什么要这样做？）它只是说结果是未定义的。

在实践中，您可能会发现减法有效。然而，编译器非常复杂，并且充分利用标准的规则来尽可能生成最快的代码；当你违反规则时，这可能而且经常会导致你的程序看起来做奇怪的事情。当编译器假设原始值和结果都引用同一个数组时，如果您的指针算术运算被破坏，请不要太惊讶——您违反了这个假设。

减去与同一数组未定义行为无关的两个指针的基本原理是什么？

What is the rationale of making subtraction of two pointers not related to the same array undefined behavior?

c++

pointer-arithmetic

language-lawyer