这段看似加了两个指针的基本汇编代码怎么理解呢？

Question

根据下列MIPS指令构造一个C语言语句。
(var f -> $s0, 数组A和B的起始地址 -> $s6, $s7)

addi $t0, $s6, 4        //$t0 = &A[1]
add  $t1, $s6, [=10=]       //$t1 = &A[0]
sw   $t1, 0($t0)        //A[1] = &A[0]
lw   $t0, 0($t0)        //$t0 = &A[0]
add  $s0, $t1, $t0      //f = &A[0] + &A[0]

左边是说明，右边是我看不懂的评论
我得到的最终答案是 f = &A[0] + &A[0]，但这似乎不对。我哪里错了？

Answer 1

你没疯，代码真的很奇怪！

添加两个指针基本上没有意义，所以这是一个技巧问题。
等效的 C 确实看起来不对/疯狂：

intptr_t *A = ...;  // in $s6

 A[1] = (intptr_t)&A[0];
 f = A[1] + (intptr_t)&A[0];

请注意，有符号溢出在 C 中是未定义的行为，因此将其编译为 MIPS add 是合法的，这将捕获有符号溢出。如果我们使用 uintptr_t，所需的溢出语义将是包装/截断，add 没有实现。

（MIPS 的真实世界 C 编译器总是使用 addu / addiu，而不是 add，即使是有符号的 int，因为未定义的行为意味着任何事情都是允许的，包括包装。如果您使用 gcc -fwrapv 进行编译，它甚至是必需的。由于 MIPS 是一个 2 的补码机器，addu 与 add 是相同的二进制操作，它的不同之处仅在于不捕获有符号溢出：当输入符号相同，但输出符号不同。）

就 C 而言，它将编译回更接近给定 asm 的东西，或者至少用 C 临时变量表示每个 asm 操作：

我使用 GNU C register-global variables 而不是函数参数，因此函数主体将使用实际正确的寄存器（并且不会用 save/restore 和初始化这些寄存器的额外指令使 asm 混乱）。所以这让我让 GCC 制作一个 asm 块，其中 s 寄存器作为输入和输出，而不是正常的调用约定。

#include <stdint.h>

register intptr_t *A  asm("s6");
// register char  *B  asm("s7");    // unused, no idea what type makes sense
register intptr_t f asm("s0");

void foo()
{
  volatile intptr_t *t0_ptr = A+1;  // volatile forces store and reload
  intptr_t t1 = (intptr_t)A;

  *t0_ptr = t1;                  //sw   $t1, 0($t0)       //A[1] = &A[0]
  intptr_t t0_int = *t0_ptr;     //lw   $t0, 0($t0)       //$t0 = &A[0]
  f = t0_int + t1;               //add  $s0, $t1, $t0     //f = &A[0] + &A[0]
  //return f;
}

请注意，$t0 在这里用于两种不同的事物，具有不同的类型：一种是指向数组的指针，另一种是数组中的值。我用两个不同的 C 变量表达了这一点，因为事情通常是这样的。（当一个寄存器在另一个变量之前/之前“死”时，编译器将为不同的变量重用相同的寄存器。）

MIPS 的 GCC5.4 生成的汇编，options to make MARS-compatible asm: -O2 -march=mips3 -fno-delayed-branch. MIPS3 means no load delay slots, like the code in the question which uses the lw result in the instruction after the load. (Godbolt compiler explorer)

foo:
        move    ,         # $v0, $s6   pointless copy into $v0
        sw      ,4()      # A[1] = A
        lw      ,4()      # v1 = A[1]
        addu    ,,     # $s6 = (intptr_t)A + A[1]
        j       
        nop                                  # branch-delay slot

（GCC 使用数字寄存器名称，而不是 ABI 名称，例如 $s? 用于调用保留，$t? 用于调用破坏的临时寄存器等。http://www.cs.uwm.edu/classes/cs315/Bacon/Lecture/HTML/ch05s03.html 有一个 table.)

另一种写法，不太严谨：重要的区别是缺少 volatile 来强制编译器重新加载。

void bar() {
  A[1] = &A[0];
  f = A[1] + (intptr_t)&A[0];
}

bar:
        move    ,          # still a useless copy
        sw      ,4()
        sll     ,,1       # 2 * (intptr_t)A;   no reload, just CSE the store value.
        j       
        nop

当然还有其他表达方式，例如使用 A 作为指针数组，而不是 intptr_t、int 或 int32_t.

的数组

我选择整数是因为 C 指针类型在您进行指针加法时神奇地按类型宽度缩放。

这段看似加了两个指针的基本汇编代码怎么理解呢？

How to understand this basic Assembly Code that seems to be adding two pointers?

assembly

reverse-engineering

mips