假设指向同一个变量的两个指针是 illegal/UB，为什么 C 编译器不能优化更改 const 指针的值？

Question

最近我偶然发现了 Rust 和 C 之间的比较，他们使用以下代码：

bool f(int* a, const int* b) {
  *a = 2;
  int ret = *b;
  *a = 3;
  return ret != 0;
}

在 Rust 中（相同的代码，但使用 Rust 语法），它生成以下汇编代码：

    cmp      dword ptr [rsi], 0 
    mov      dword ptr [rdi], 3 
    setne al                    
    ret

使用 gcc 时，它会产生以下内容：

   mov      DWORD PTR [rdi], 2   
   mov      eax, DWORD PTR [rsi]
   mov      DWORD PTR [rdi], 3        
   test     eax, eax                  
   setne al                           
   ret

文本声称 C 函数无法优化第一行，因为 a 和 b 可能指向相同的数字。在 Rust 中这是不允许的，所以编译器可以优化它。

现在回答我的问题：

该函数接受一个 const int*，即 a pointer to a const int. I read this question，它声明用指针修改 const int 应该会导致编译器警告和 UB 中最糟糕的转换。

如果我用两个指向同一个整数的指针调用这个函数，它会产生一个 UB 吗？

为什么 C 编译器不能优化第一行，假设两个指向同一个变量的指针是 illegal/UB？

Link to godbolt

Answer 1

Why can't the C Compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?

因为您没有指示 C 编译器这样做 -- 它被允许做出这样的假设。

C 有一个名为 restrict 的类型限定符，大致意思是：这个指针不与其他指针重叠（不是完全，而是一起玩）。

的汇编输出

bool f(int* restrict a, const int* b) {
  *a = 2;
  int ret = *b;
  *a = 3;
  return ret != 0;
}

是

        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], 3
        test    eax, eax
        setne   al
        ret

...其中removes/optimizes-away赋值*a = 2

来自https://en.wikipedia.org/wiki/Restrict

In the C programming language, restrict is a keyword that can be used in pointer declarations. By adding this type qualifier, a programmer hints to the compiler that for the lifetime of the pointer, only the pointer itself or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points.

Answer 2

函数int f(int *a, const int *b);承诺不会通过该指针更改b的内容...它不承诺通过a指针。

如果a和b指向同一个对象，通过到a改变它是合法的（前提是底层对象是可修改的, 当然).

示例：

int val = 0; f(&val, &val);

Answer 3

虽然其他答案都提到了C端，但Rust端还是值得一看的。使用 Rust，您拥有的代码可能是这样的：

fn f(a:&mut i32, b:&i32)->bool{
    *a = 2;
    let ret = *b;
    *a = 3;
    return ret != 0;
}

该函数接受两个引用，一个是可变的，一个不是。引用是保证对读取有效的指针，可变引用也保证是唯一的，所以它被优化为

        cmp     dword ptr [rsi], 0
        mov     dword ptr [rdi], 3
        setne   al
        ret

但是，Rust 也有等同于 C 指针的原始指针，但不提供此类保证。以下函数接受原始指针：

unsafe fn g(a:*mut i32, b:*const i32)->bool{
    *a = 2;
    let ret = *b;
    *a = 3;
    return ret != 0;
}

错过优化并编译为：

        mov     dword ptr [rdi], 2
        cmp     dword ptr [rsi], 0
        mov     dword ptr [rdi], 3
        setne   al
        ret

Godbolt Link

Answer 4

The function takes a const int* which is a pointer to a const int.

不，const int* 不是指向 const int 的指针。说这话的人都是被骗了。

int* 是指向绝对不是 const 的 int 的指针。
const int* 是一个指向未知常数的指针。
无法表达指向绝对为 const 的 int 指针的概念。

如果 C 是一种设计更好的语言，那么 const int * 将是指向 const int 的指针，mutable int *（借用 C++ 的关键字）将是指向非 const int 的指针，并且 int * 将是指向未知常量的 int 的指针。删除限定符（即忘记指向类型的某些内容）是安全的——与真正的 C 相反，在 C 中添加 const 限定符是安全的。我没有使用过 Rust，但从另一个答案的示例中可以看出它使用了类似的语法。

引入const的Bjarne Stroustrup，最初将其命名为readonly，这更接近其实际含义。 int readonly* 会更清楚地表明它是只读的指针，而不是指向的对象。重命名为 const 让几代程序员感到困惑。

当我有选择的时候，我总是写 foo const*，而不是 const foo*，作为仅次于 readonly* 的最佳选择。

Answer 5

需要注意的是，这个问题是在讨论 -Ofast 的优化以及那里的情况如何。

本质上，函数的 C 编译器不知道可能传递给它的完整的离散地址集，因为直到 link 时间/运行时才知道，因为函数可以从多个翻译单元，因此它会考虑处理 a 和 b 可能指向的任何合法地址，当然包括它们重叠的情况。

因此，您需要使用 restrict 来告诉它更新 a（该函数允许这样做，因为它不是指向常量的指针，但即便如此，该函数也可以取消const) 不更新 b 指向的值，它需要包含在与 0 的比较中，因此在比较需要继续之前发生的 a 的存储，而生锈默认假设是限制。然而，该函数的编译器确实知道 *a 与 *(a+1-1) 相同，因此不会生成 2 个单独的存储，但它不知道 a 或 b 是否重叠.

假设指向同一个变量的两个指针是 illegal/UB，为什么 C 编译器不能优化更改 const 指针的值？

Why can't the C compiler optimize changing the value of a const pointer assuming that two pointers to the same variable would be illegal/UB?

c

strict-aliasing

undefined-behavior

rust