什么是 retpoline，它是如何工作的？

Question

为了减轻内核或跨进程内存泄露（Spectre attack), the Linux kernel¹ will be compiled with a new option、-mindirect-branch=thunk-extern 引入 gcc 以通过所谓的 [=22= 执行间接调用]retpoline.

这似乎是一个新发明的术语，因为 Google 搜索只是最近才出现（通常都是在 2018 年）。

什么是retpoline，它如何防止最近的内核信息泄露攻击？

¹ 它不是 Linux 特定的，但是 - 相似或相同的构造似乎被用作其他操作系统上 mitigation strategies 的一部分。

Answer 1

The article sgbj 在 Google 的 Paul Turner 所写的评论中提到了更详细的解释以下内容，但我会试一试：

据我目前从有限的信息中拼凑出来的信息，retpoline 是一个 return trampoline，它使用了一个永远不会执行的无限循环以防止 CPU 推测间接跳转的目标。

基本方法见 Andi Kleen's kernel branch 解决这个问题：

它引入了新的 __x86.indirect_thunk call that loads the call target whose memory address (which I'll call ADDR) is stored on top of the stack and executes the jump using a the RET instruction. The thunk itself is then called using the NOSPEC_JMP/CALL 宏，用于替换许多（如果不是全部的话）间接调用和跳转。宏只是将调用目标放在堆栈上，并在必要时正确设置 return 地址（注意非线性控制流）：

.macro NOSPEC_CALL target
    jmp     1221f            /* jumps to the end of the macro */
1222:
    push    \target          /* pushes ADDR to the stack */
    jmp __x86.indirect_thunk /* executes the indirect jump */
1221:
    call    1222b            /* pushes the return address to the stack */
.endm

最后放置call是必要的，这样当间接调用完成时，控制流在NOSPEC_CALL宏的使用之后继续，所以它可以在适当的地方使用常规 call

thunk 本身如下所示：

    call retpoline_call_target
2:
    lfence /* stop speculation */
    jmp 2b
retpoline_call_target:
    lea 8(%rsp), %rsp 
    ret

这里的控制流程可能有点混乱，所以让我澄清一下：

call 将当前指令指针（标签 2）压入堆栈。
lea 将 8 添加到 堆栈指针 ，有效地丢弃最近推送的四字，这是最后一个 return 地址（到标签 2） .此后，栈顶再次指向真实的return地址ADDR。
ret 跳转到 *ADDR 并将堆栈指针重置为调用堆栈的开头。

最后，这整个行为实际上等同于直接跳转到 *ADDR。我们得到的一个好处是，用于 return 语句的分支预测器（Return 堆栈缓冲区，RSB），在执行 call 指令时，假设相应的 ret 语句将跳转到标签 2.

标签 2 之后的部分实际上永远不会执行，它只是一个无限循环，理论上会用 JMP 指令填充指令流水线。通过使用 LFENCE、PAUSE 或更普遍的指令，导致指令流水线停止的指令会阻止 CPU 在此推测执行上浪费任何能量和时间。这是因为如果对 retpoline_call_target 的调用通常会 return，则 LFENCE 将是下一条要执行的指令。这也是分支预测器将根据原始 return 地址（标签 2）

预测的内容

引用英特尔架构手册：

Instructions following an LFENCE may be fetched from memory before the LFENCE, but they will not execute until the LFENCE completes.

但是请注意，规范从未提及 LFENCE 和 PAUSE 会导致管道停止，所以我在这里读了一些字里行间的内容。

现在回到你原来的问题：内核内存信息泄露是可能的，因为结合了两个想法：

即使推测执行在推测错误时应该没有副作用，推测执行仍然会影响缓存层次结构。这意味着当内存加载被推测执行时，它可能仍然导致缓存行被驱逐。可以通过仔细测量映射到同一缓存集的内存的访问时间来识别缓存层次结构中的这种变化。
当内存读取的源地址本身是从内核内存中读取时，您甚至可以泄漏一些任意内存位。
Intel的间接分支预测器CPUs只使用源指令的最低12位，因此很容易毒化所有2^12种可能的预测历史，用户控制内存地址。然后，当在内核中预测到间接跳转时，可以使用内核特权推测性地执行这些操作。使用缓存定时边通道，您可以泄漏任意内核内存。

更新：在 kernel mailing list 上，正在进行的讨论使我相信 retpolines 不能完全缓解分支预测问题，就像 Return 堆栈一样缓冲区 (RSB) 运行为空，较新的英特尔架构 (Skylake+) 回退到易受攻击的分支目标缓冲区 (BTB)：

Retpoline as a mitigation strategy swaps indirect branches for returns, to avoid using predictions which come from the BTB, as they can be poisoned by an attacker. The problem with Skylake+ is that an RSB underflow falls back to using a BTB prediction, which allows the attacker to take control of speculation.

Answer 2

一个 retpoline is designed to protect against the branch target injection (CVE-2017-5715) 漏洞。这是一种攻击，其中使用内核中的间接分支指令强制执行任意代码块的推测执行。选择的代码是一个 "gadget" ，它在某种程度上对攻击者有用。例如，可以选择代码，以便通过影响缓存的方式泄漏内核数据。 retpoline 通过简单地用 return 指令替换所有间接分支指令来防止这种利用。

我认为 retpoline 的关键在于 "ret" 部分，它将间接分支替换为 return 指令，以便 CPU 使用 return 堆栈预测器而不是可利用的分支预测器。如果使用简单的推送和 return 指令，那么推测执行的代码将是函数最终 return 的代码，而不是对攻击者有用的小工具。蹦床部分的主要好处似乎是维护 return 堆栈，因此当函数实际对其调用者执行 return 时，这是正确预测的。

分支目标注入背后的基本思想很简单。它利用了 CPU 不在其分支目标缓冲区中记录分支源和目标的完整地址这一事实。因此，攻击者可以在其自己的地址 space 中使用跳转来填充缓冲区，当在内核地址 space 中执行特定的间接跳转时，这将导致预测命中。

请注意，retpoline 不会直接阻止内核信息泄露，它只会阻止间接分支指令被用于推测性地执行会泄露信息的小工具。如果攻击者可以找到一些其他方法来推测性地执行小工具，则 retpoline 不会阻止攻击。

Paul Kocher、Daniel Genkin、Daniel Gruss、Werner Haas、Mike Hamburg 的论文 Spectre Attacks: Exploiting Speculative Execution， Moritz Lipp、Stefan Mangard、Thomas Prescher、Michael Schwarz 和 Yuval Yarom 概述了如何利用间接分支：

Exploiting Indirect Branches. Drawing from return oriented programming (ROP), in this method the attacker chooses a gadget from the address space of the victim and influences the victim to execute the gadget speculatively. Unlike ROP, the attacker does not rely on a vulnerability in the victim code. Instead, the attacker trains the Branch Target Buffer (BTB) to mispredict a branch from an indirect branch instruction to the address of the gadget, resulting in a speculative execution of the gadget. While the speculatively executed instructions are abandoned, their effects on the cache are not reverted. These effects can be used by the gadget to leak sensitive information. We show how, with a careful selection of a gadget, this method can be used to read arbitrary memory from the victim.

To mistrain the BTB, the attacker finds the virtual address of the gadget in the victim’s address space, then performs indirect branches to this address. This training is done from the attacker’s address space, and it does not matter what resides at the gadget address in the attacker’s address space; all that is required is that the branch used for training branches to use the same destination virtual address. (In fact, as long as the attacker handles exceptions, the attack can work even if there is no code mapped at the virtual address of the gadget in the attacker’s address space.) There is also no need for a complete match of the source address of the branch used for training and the address of the targetted branch. Thus, the attacker has significant flexibility in setting up the training.

零项目团队 Google 的一篇名为 Reading privileged memory with a side-channel 的博客文章提供了另一个示例，说明如何使用分支目标注入来创建有效的漏洞。

Answer 3

不久前有人问过这个问题，应该得到更新的答案。

Executive Summary:

“Retpoline”序列是一种软件结构，它允许将间接分支与推测执行隔离开来。这可用于保护敏感的二进制文件（例如操作系统或管理程序实现）免受针对其间接分支的分支目标注入攻击。

“retpoline" is a portmanteau of the words "return" and "trampoline", much like the improvement "relpoline”这个词是由“亲戚呼唤”和“蹦床”造出来的。它是一个使用 return 操作构建的蹦床结构，它也象征性地确保任何关联的推测执行将无休止地“弹跳”。

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel ^[1] will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called retpoline.

^{[1] It's not Linux specific, however - similar or identical construct seems to be used as part of the mitigation strategies on other OSes.}

使用此编译器选项仅可以防止 Spectre V2 在具有 CVE-2017-5715 所需的微码更新的受影响处理器中。它可以在任何代码（不仅仅是内核）上“工作”，但只有包含“秘密”的代码才值得攻击。

This appears to be a newly invented term as a Google search turns up only very recent use (generally all in 2018).

LLVM compiler has had a -mretpoline switch since before Jan 4 2018. That date is when the vulnerability was first publically reported. GCC made their patches available 2018 年 1 月 7 日。

CVE 日期表明该漏洞是在 2017 年“发现”，但它影响了过去二十年制造的一些处理器（因此很可能很久以前就发现了).

What is a retpoline and how does it prevent the recent kernel information disclosure attacks?

首先，几个定义：

Trampoline - Sometimes referred to as indirect jump vectors trampolines are memory locations holding addresses pointing to interrupt service routines, I/O routines, etc. Execution jumps into the trampoline and then immediately jumps out, or bounces, hence the term trampoline. GCC has traditionally 通过在获取嵌套函数地址的运行时间创建可执行蹦床来支持嵌套函数。这是一小段代码，通常驻留在堆栈中，位于包含函数的堆栈帧中。 trampoline加载静态链寄存器，然后跳转到嵌套函数的真实地址。
Thunk - thunk 是一个子程序，用于将附加计算注入另一个子程序。 Thunk 主要用于延迟计算直到需要其结果，或在其他子例程的开头或结尾插入操作
Memoization - 记忆函数“记住”对应于一组特定输入的结果。使用记住输入的后续调用 return 记住结果而不是重新计算它，从而消除了使用给定参数调用的主要成本，除了使用这些参数对函数进行的第一次调用之外。

非常粗略地说，retpoline 是 trampoline 和 return 作为 thunk，以“spoil' memoization 在间接分支预测器中。

Source：retpoline 包含 Intel 的 PAUSE 指令，但 AMD 需要 LFENCE 指令，因为在该处理器上 PAUSE 指令不是序列化指令，因此 pause/jmp 循环将使用多余的功率，因为它被推测等待 return 错误预测到正确的目标。

Arstechnica有问题的简单解释：

"Each processor has an architectural behavior (the documented behavior that describes how the instructions work and that programmers depend on to write their programs) and a microarchitectural behavior (the way an actual implementation of the architecture behaves). These can diverge in subtle ways. For example, architecturally, a program that loads a value from a particular address in memory will wait until the address is known before trying to perform the load. Microarchitecturally, however, the processor might try to speculatively guess at the address so that it can start loading the value from memory (which is slow) even before it's absolutely certain of which address it should use.

If the processor guesses wrong, it will ignore the guessed-at value and perform the load again, this time with the correct address. The architecturally defined behavior is thus preserved. But that faulty guess will disturb other parts of the processor—in particular the contents of the cache. These microarchitectural disturbances can be detected and measured by timing how long it takes to access data that should (or shouldn't) be in the cache, allowing a malicious program to make inferences about the values stored in memory.".

来自英特尔的论文：“Retpoline: A Branch Target Injection Mitigation" (.PDF):

"A retpoline sequence prevents the processor’s speculative execution from using the "indirect branch predictor" (one way of predicting program flow) to speculate to an address controlled by an exploit (satisfying element 4 of the five elements of branch target injection (Spectre variant 2) exploit composition listed above).".

Note, element 4 is: "The exploit must successfully influence this indirect branch to speculatively mispredict and execute a gadget. This gadget, chosen by the exploit, leaks the secret data via a side channel, typically by cache-timing.".

什么是 retpoline，它是如何工作的？

What is a retpoline and how does it work?

security

x86

assembly

cpu-architecture

spectre