Pentium III CPU 如何处理来自同一组的多个指令前缀?

How did Pentium III CPUs handle multiple instruction prefixes from the same group?

Intel x86 规范指出,使用来自同一组的多个指令前缀会导致未定义的行为。实际上,Pentium III Coppermine CPU 在那种情况下是如何反应的?遗憾的是我没有芯片可以测试。

虽然您已经知道这一点,但为了清楚起见,我还是先说明一下。 x86 指令最多可以有 4 个前缀(每个都来自不同的组),它们会改变处理器对指令的解释。来自 Intel IA-32 Architecture Manual, Volume 2A,第 2.1 节:

2.1 INSTRUCTION FORMAT FOR PROTECTED MODE, REAL-ADDRESS MODE, AND VIRTUAL-8086 MODE

The Intel 64 and IA-32 architectures instruction encodings are subsets of the format shown in Figure 2-1. Instructions consist of optional instruction prefixes (in any order), primary opcode bytes (up to three bytes), an addressing-form specifier (if required) consisting of the ModR/M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if required).


Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format

2.1.1 Instruction Prefixes

Instruction prefixes are divided into four groups, each with a set of allowable prefix codes. For each instruction, it is only useful to include up to one prefix code from each of the four groups (Groups 1, 2, 3, 4). Groups 1 through 4 may be placed in any order relative to each other.

  • Group 1
    • Lock and repeat prefixes:
      • LOCK prefix is encoded using F0H.
      • REPNE/REPNZ prefix is encoded using F2H. Repeat-Not-Zero prefix applies only to string and input/output instructions. (F2H is also used as a mandatory prefix for some instructions.)
      • REP or REPE/REPZ is encoded using F3H. The repeat prefix applies only to string and input/output instructions. F3H is also used as a mandatory prefix for POPCNT, LZCNT and ADOX instructions.
    • Bound prefix is encoded using F2H if the following conditions are true:
      • CPUID.(EAX=07H, ECX=0):EBX.MPX[bit 14] is set.
      • BNDCFGU.EN and/or IA32_BNDCFGS.EN is set.
    • When the F2 prefix precedes a near CALL, a near RET, a near JMP, or a near Jcc instruction (see Chapter 17, “Intel® MPX,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1).
  • Group 2
    • Segment override prefixes:
      • 2EH—CS segment override (use with any branch instruction is reserved).
      • 36H—SS segment override prefix (use with any branch instruction is reserved).
      • 3EH—DS segment override prefix (use with any branch instruction is reserved).
      • 26H—ES segment override prefix (use with any branch instruction is reserved).
      • 64H—FS segment override prefix (use with any branch instruction is reserved).
      • 65H—GS segment override prefix (use with any branch instruction is reserved).
    • Branch hints (no longer used; reserved):
      • 2EH—Branch not taken (used only with Jcc instructions).
      • 3EH—Branch taken (used only with Jcc instructions).
  • Group 3
    • Operand-size override prefix is encoded using 66H (66H is also used as a mandatory prefix for some instructions).
  • Group 4
    • 67H—Address-size override prefix.

The LOCK prefix (F0H) forces an operation that ensures exclusive use of shared memory in a multiprocessor environment. See “LOCK—Assert LOCK# Signal Prefix” in Chapter 3, “Instruction Set Reference, A-L,” for a description of this prefix.

Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string. Use these prefixes only with string and I/O instructions (MOVS, CMPS, SCAS, LODS, STOS, INS, and OUTS). Use of repeat prefixes and/or undefined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior.

Some instructions may use F2H,F3H as a mandatory prefix to express distinct functionality.

Branch hint prefixes (2EH, 3EH) allow a program to give a hint to the processor about the most likely code path for a branch. Use these prefixes only with conditional branch instructions (Jcc). Other use of branch hint prefixes and/or other undefined opcodes with Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior.

The operand-size override prefix allows a program to switch between 16- and 32-bit operand sizes. Either size can be the default; use of the prefix selects the non-default size.

Some SSE2/SSE3/SSSE3/SSE4 instructions and instructions using a three-byte sequence of primary opcode bytes may use 66H as a mandatory prefix to express distinct functionality.

Other use of the 66H prefix is reserved; such use may cause unpredictable behavior.

The address-size override prefix (67H) allows programs to switch between 16- and 32-bit addressing. Either size can be the default; the prefix selects the non-default size. Using this prefix and/or other undefined opcodes when operands for the instruction do not reside in memory is reserved; such use may cause unpredictable behavior.

请注意,它实际上并没有说来自同一组的多个指令前缀导致 "undefined behavior.",而是说它是 "only useful",每个组最多包含一个。这让事情变得很不确定。

在我看来,你从规范中得到的唯一正式的 gua运行tees 是某些特定的指令和前缀组合可能导致 "unpredictable behavior" 或异常,并且任何超过 15 个字节的单个指令会导致 "Invalid Opcode" 异常。

这让我们可以根据经验测试来自每组的多个前缀,这些前缀在其他方面得到支持的指令上。为此,根据要求,我 运行 在 Pentium III Coppermine1:

上进行了以下测试
  1. 组 1:多个 REPE (F3) 和 REPNE (F2) CMPSB 指令的前缀 (A6).

    只有遇到的last前缀才有效果;来自同一组的其他前缀将被忽略。

    事实上,这似乎是所有 x86 处理器的标准行为,并且与 Microsoft 的反汇编程序显示代码的方式一致。前导(忽略)前缀未显示为指令的一部分。

  2. 第 2 组:加载 (MOV) 指令上的多个段覆盖前缀。

    同样,最后一个前缀是唯一重要的。所有其他的都被忽略。而且,这似乎是所有 x86 处理器的标准。

    (我没有费心测试 b运行ch-hint 前缀,无论是单独测试还是与段覆盖前缀结合使用,因为这些 b运行ch 提示在所有处理器上都被忽略但是奔腾 4.)

  3. 组 3:多个操作运行d-size 覆盖前缀 (66h).

    重复的前缀将被忽略,因此多个 66h 前缀与一个 66h 前缀的效果完全相同。他们不会相互抵消或任何类似的事情。

    各种在线来源证实这是所有 x86 处理器的标准行为。

  4. 第 4 组:多个地址大小覆盖前缀 (67h)。

    与第 3 组相同:忽略重复的前缀。

总结:实际上,除了 last 前缀之外的所有来自特定组的前缀都将被忽略。指令中遇到的最后一个前缀是生效的前缀。所有前面多余的或无意义的前缀都将被忽略。对于 all x86 处理器来说这似乎是正确的,这意味着仿真代码不需要针对任何特定的 generation/microarchitecture. 特殊情况下的这种行为但是,前缀在一种情况下没有影响 可能 被重新用于对未来的处理器有一些意义,所以这是需要注意的事情。

如果可能的话,为了避免让您头疼,您可以考虑将此解释工作卸载到您的解码器。具体来说,是英特尔写的,Intel XED library (repository here on GitHub). You just give it anywhere from 1 to 15 bytes, and it returns the decoded opcode (including prefixes) and operands. Decoding is the hard part of x86, so this should save you a lot of headaches. It implements the same algorithm as described here—see, e.g., these notes and this code.

__
1 具体来说,Intel Pentium III EB @ 866 MHz(系列 6,模型 8,步进 6,修订版 cC0)。这是一个 Socket 370 FC-PGA 芯片,运行 在具有基于 Intel 815 的主板(133 MHz FSB)的 Compaq Deskpro EN 系统上。如果它很重要(显然不应该),操作环境是 Windows 2000 SP4。我使用 MASM 和 Visual Studio 的调试器进行测试。