简单ARM汇编代码分析

Question

我开始使用 GNU 工具链学习 ARM 汇编语言，并使用以下代码在 C 中创建了一个非常简单的函数示例：

#include <stdint.h>

    uint32_t *a;
    uint32_t *b;
    uint32_t *c;

     __attribute__((naked)) void f() {

             *a += *c;
             *b += *c;
       }

我在终端中使用这个命令查看汇编代码后：

arm-none-eabi-gcc -O1 -S -std=c99 example.c -o -

这就是结果：

    @ Function supports interworking.
    @ Naked Function: prologue and epilogue provided by programmer.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    ldr r3, .L2
    ldr r2, .L2+4
    ldr r3, [r3]
    ldr r2, [r2] ; r2 <- &(*c)
    ldr ip, [r3]
    ldr r0, [r2] ; r0 <- *c
    ldr r1, .L2+8
    add r0, ip, r0
    str r0, [r3]
    ldr r3, [r1]
    ldr r2, [r2] ; why make the same thing 
    ldr r1, [r3]
    add r2, r1, r2
    str r2, [r3]
.L3:
    .align  2
.L2:
    .word   a
    .word   c
    .word   b
    .size   f, .-f
    .comm   c,4,4
    .comm   b,4,4
    .comm   a,4,4

我的问题是为什么编译器加载指针c地址的两倍，如果我很好理解，那就是

行

ldr  r2, [r2]

我找不到编译器复制此代码的充分理由。提前致谢。

Answer 1

连续执行 ldr rX, [rX] 意味着对 rX 指向的内容进行双重解引用。

如果我答对了你的问题，你说的第一个是：

ldr r2, [r2] ; r2 <- &(*c)

然后第二个变成

ldr r2, [r2] ; r2 <- *(r2)

如果这不是问题，那么来自 GCC docs（见粗体部分）：

naked

This attribute is available on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX and SPU ports. It allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only Basic asm statements can safely be included in naked functions (see Basic Asm). While using Extended asm or a mixture of Basic asm and “C” code may appear to work, they cannot be depended upon to work reliably and are not supported.

Answer 2

如果您的指针别名，则需要两个解引用。想想如果你有 a == c，你的算法会做什么。如果他们不能别名，你需要添加一些 restrict 关键字。这是一个优化您期望方式的示例：

#include <stdint.h>

void f(uint32_t * restrict a, uint32_t * restrict b, uint32_t * restrict c)
{
    *a += *c;
    *b += *c;
}

和汇编输出（评论我的）：

00000000 <f>:
   0:   e5922000    ldr r2, [r2]     // r2 = *c
   4:   e5903000    ldr r3, [r0]     // r3 = *a
   8:   e0833002    add r3, r3, r2   // r3 = r3 + r2 = *a + *c
   c:   e5803000    str r3, [r0]     // *a = r3 = *a + *c
  10:   e5910000    ldr r0, [r1]     // r0 = *b
  14:   e0800002    add r0, r0, r2   // r0 = r0 + r2 = *b + *c
  18:   e5810000    str r0, [r1]     // *b = r0 = *b + *c
  1c:   e12fff1e    bx  lr

编辑：这是一个更像你原来的例子，第一个没有 restrict 关键字，第二个有，这次是 GCC 的输出格式。

示例一（无restrict关键字）代码：

#include <stdint.h>

__attribute__((naked))
void f(uint32_t *a, uint32_t *b, uint32_t *c)
{
    *a += *c;
    *b += *c;
}

输出：

f:
    ldr ip, [r0, #0]
    ldr r3, [r2, #0]
    add r3, ip, r3
    str r3, [r0, #0]
    ldr r0, [r1, #0]
    ldr r3, [r2, #0]
    add r3, r0, r3
    str r3, [r1, #0]

示例二（带restrict个关键字）代码：

#include <stdint.h>

__attribute__((naked))
void f(uint32_t * restrict a, uint32_t * restrict b, uint32_t * restrict c)
{
    *a += *c;
    *b += *c;
}

输出：

f:
    ldr r3, [r2, #0]
    ldr ip, [r1, #0]
    ldr r2, [r0, #0]
    add r2, r2, r3
    add r3, ip, r3
    str r2, [r0, #0]
    str r3, [r1, #0]

c 的第二次取消引用不在第二个程序中，将其缩短一条指令。

Answer 3

add 破坏了 r0 所以我们丢失了 c 的值并且必须重新加载它

ldr r2, .L2+4   get address of .data location of *c from .text
...
ldr r2, [r2] ; r2 = pointer to c
...
ldr r0, [r2] ; r0  = c
...
add r0, ip, r0 ; this destroys r0 it no longer holds the value of c
...
ldr r2, [r2] ; need the value of c again to add to b

有趣的是，不同版本的 gcc and/or 不同的优化选择了不同的寄存器组合。但与附加负载的顺序相同。这里最主要的是它为什么这样做：

add r0, ip, r0
str r0, [r3]

而不是

add ip, ip, r0
str ip, [r3]

然后不需要重新加载c?

窥视孔优化器的细微差别是我的猜测。另一个相关的问题是为什么在完成存储 a 之前就开始弄乱 **b？如果没有这样做，它还会有另一个免费注册。（无疑又是一个优化）

另一个有趣的点是至少我的一个 gcc 编译器产生了这个：

00001000 <_start>:
    1000:   eaffffff    b   1004 <fun>

00001004 <fun>:
    1004:   e59f2034    ldr r2, [pc, #52]   ; 1040 <fun+0x3c>
    1008:   e59f3034    ldr r3, [pc, #52]   ; 1044 <fun+0x40>
    100c:   e5921000    ldr r1, [r2]
    1010:   e5932000    ldr r2, [r3]
    1014:   e591c000    ldr ip, [r1]
    1018:   e5920000    ldr r0, [r2]
    101c:   e59f3024    ldr r3, [pc, #36]   ; 1048 <fun+0x44>
    1020:   e08c0000    add r0, ip, r0
    1024:   e5933000    ldr r3, [r3]
    1028:   e5810000    str r0, [r1]
    102c:   e5922000    ldr r2, [r2]
    1030:   e5931000    ldr r1, [r3]
    1034:   e0812002    add r2, r1, r2
    1038:   e5832000    str r2, [r3]
    103c:   e12fff1e    bx  lr
    1040:   00009054    andeq   r9, r0, r4, asr r0
    1044:   00009050    andeq   r9, r0, r0, asr r0
    1048:   0000904c    andeq   r9, r0, ip, asr #32

Disassembly of section .bss:

0000904c <__bss_start>:
    904c:   00000000    andeq   r0, r0, r0

00009050 <c>:
    9050:   00000000    andeq   r0, r0, r0

00009054 <a>:
    9054:   00000000    andeq   r0, r0, r0

不管有没有裸体你都会得到同样的东西，为什么 gcc 如此绝望地使用每个一次性寄存器而不使用堆栈。请注意，在您的编译中，它添加了一个然后将其存储在我的中，它添加了一个然后加载 *b 然后存储一个。它不仅在序列中向上移动了 **b 的负载，而且还在完成 a.

的结果之前加载了 *b。

所以除了删除函数末尾的 bx lr 之外，赤裸裸的东西在这里没有帮助。你 can/should 尝试的是 gcc 命令行上的 -fdump-rtl-all （生成很多文件）并逐步浏览这些文件以查看 gcc 开始的位置以及它改变的地方，也许这将决定输出或者，如果不在编译器内部，那么在后端，窥视孔优化器重新安排了一些东西，但不确定命令行是什么来转储它。

底线是，虽然从长远来看（数万、数十万、数百万行代码）compiler/optmizer 将超越人类，但很容易被孤立根据您对更好的定义，可以手动调整优化代码的部分 "better"。请注意，指令越少并不总是越好。

简单ARM汇编代码分析

Analysis of simple ARM assembly code

assembly

arm