这个 128 位整数乘法在汇编 (x86-64) 中如何工作？

Question

我正在阅读 Computer Systems: A Programmer's Perspective，作业是描述该算法的工作原理。

C函数：

void store_prod(__int128 *dest, int64_t x, int64_t y) {
    *dest = x * (__int128)y;
}

程序集：

movq %rdx, %rax
cqto
movq  %rsi, %rcx
sarq  ,  %rcx
imulq %rax, %rcx
imulq %rsi, %rdx
addq  %rdx, %rcx
mulq  %rsi
addq  %rcx, %rdx
movq  %rax, (%rdi)
movq  %rdx, 8(%rdi)
ret

我不知道它为什么执行：xh * yl + yh * xl = value which we add after unsigned multiplication

Answer 1

一如既往，编译器选项很重要。该源代码带有 gcc -Og（针对调试进行了优化）produces very similar asm to your listing（转换符号将两个操作数扩展到 128 位，然后再进行完整的 128x128 => 128 位乘法运算）。这是 C 标准所说的应该发生的事情的天真实现（将两个操作数转换为相同类型的整数优先规则）。

如果你要谈论编译器输出，你应该总是说哪个编译器的哪个版本有什么选项。或者在全球版 godbolt, like the one above. (Edit: oops, source and asm were from a book that didn't give that info. And if that's the global edition of CS:APP 3e, beware that 上 post 一个 link。）

对于 gcc -O3 或 -O2，GCC 利用了两个操作数实际上仍然只有 64 位的事实，so a single imul is enough。（这仍然会为每个输入产生相同的结果，因此仍然会根据假设规则实现 C 逻辑。C 没有扩展操作，因此您被迫以“低效”的方式编写源代码，这取决于在编译器上将其转换为高效的 asm。）

sar , %rcx 是将 rsi 符号扩展为 rcx:rsi 的一部分，就像 cqto 将 rax 符号扩展为 rdx:rax .它用原始符号位的副本替换 RCX 的每一位。

大部分答案已经由其他人在评论中给出，但我认为没有其他人注意到 gcc -Og / -O1 给出了几乎完全相同的 asm 输出。

Answer 2

为了理解我们为什么要做这个操作，试着将int128_t解释为：2^64 * xh + xl

所以如果我们想将两个 int128_t 整数相乘，我们将执行以下操作：

x = 2^64 * xh + xl

y = 2^64 * yh + yl

所以 x * y = (2^128 * xh * yh) + (2^64 * xh * yl) + (2^64 * yh * xl) + (yl * xl)

这正是汇编代码的作用：

yh = %rdx yl = %rax

xh = %rcx xl = %rsi

2^64 * xh * yl: is imulq %rax, %rcx 2^64 表示，我们需要把这个加到高位

2^64 * yh * xl: is imulq %rsi, %rdx 2^64 表示，我们需要将其添加到高位

2^128 * xh * yh：不需要此操作，因为 2^128 * xh * yh 不适合 128 位整数。只代表符号位信息，可以忽略

xl * yl: 是 mulq %rsi

我希望这能解决问题！[=14=]

Answer 3

GCC 正在做的是使用属性可以使用 the following formula.

完成带符号乘法

(hi,lo) = unsigned(x*y)
hi -= ((x<0) ? y : 0)  + ((y<0) ? x : 0)

尽管没有必要这样做，因为在这种情况下，x86-64 指令集有一个带符号的 64 位*64 位到 128 位指令（imul 带有一个操作数) 这个公式在其他情况下很有用。例如实现 or for implementing 256-bit multiplication when the instruction set only does 128-bit multiplication（例如 x86-64）。

GCC 实现这个公式的方式有点不同。如果我们取符号位扩展到整个字，调用这个函数sign_ext，然后调用函数returns-1或0。那么GCC做的是：

hi += sign_ext(x)*y + sign_ext(y)*x

例如 sign_ext(x)*y 在 64 位字的伪指令中是

sarq  , x    ; sign_ext(x)
imulq   y, x    ; sign_ext(x)*y

所以现在你问（或打算问）：

Why is this formula true?

这是一个很好的问题。我也问过同样的问题 njuffa wrote

@Zboson: It follows directly from two's complement complement representation. E.g. 32-bit integers -n and -m are represented as unsigned numbers x=2**32-n, y=2**32-m. If you multiply those you have x*y = 2**64 - 2**32*n - 2**32*m + n*m. The middle terms indicate the necessary corrections to the upper half of the product. Working through a simple example using -1*-1 should prove very instructive.

这个 128 位整数乘法在汇编 (x86-64) 中如何工作？

How does this 128 bit integer multiplication work in assembly (x86-64)?

c

assembly

x86-64

128-bit