为什么内存指令在ARM汇编中需要4个周期？

Question

ldr、str 或 b 等内存指令在 ARM 汇编中各占用 4 个周期。

是不是每个内存位置都是4字节长？

Answer 1

ARM 具有流水线架构。每个时钟周期将流水线推进一步（例如 fetch/decode/execute/read...）。由于流水线是连续馈送的，执行每条指令的总时间可能接近 1 个周期，但单个指令从 'fetch' 到完成的实际时间可能超过 3 个周期。 ARM在他们的网站上有很好的解释：

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0222b/ch01s01s01.html

内存延迟为这个想法增加了另一层复杂性。 ARM 采用多级缓存系统，旨在以最少的周期提供最常用的数据。即使是从最快的 (L0) 缓存中读取也会涉及多个延迟周期。管道包括允许读取请求在数据未立即使用的情况下稍后完成的设施。举个例子更容易理解：

LDR R0,[R1]
MOV R2,R3    // Allow time for memory read to occur
ADD R4,R4,#200  // by interleaving other instructions
CMP R0,#0  // before trying to use the value

// By trying to access the data immediately, this will cause a pipeline
// 'stall' and waste time waiting for the data to become available.
LDR R0,[R1]
CMP R0,#0 // Wastes at least 1 cycle due to pipeline not having the data

这个想法是隐藏管道中的固有延迟，如果可以的话，通过延迟对寄存器的依赖（也称为指令交错）来隐藏内存访问中的额外延迟。

为什么内存指令在ARM汇编中需要4个周期？

Why do memory instructions take 4 cycles in ARM assembly?

performance

assembly

arm

cpu-cycles

cpu-architecture