为什么eax包含向量参数的个数?

Why does eax contain the number of vector parameters?

为什么al中包含汇编中向量参数的个数?

为什么向量参数与被调用方的普通参数有任何不同?

该值用于优化,如 ABI document

中所述

The prologue should use %al to avoid unnecessarily saving XMM registers. This is especially important for integer only programs to prevent the initialization of the XMM unit.

3.5.7 Variable Argument Lists - The Register Save Area. System V Application Binary Interface version 1.0

当你调用va_start时,它会将寄存器中传递的所有参数保存到寄存器保存区

To start, any function that is known to use va_start is required to, at the start of the function, save all registers that may have been used to pass arguments onto the stack, into the “register save area”, for future access by va_start and va_arg. This is an obvious step, and I believe pretty standard on any platform with a register calling convention. The registers are saved as integer registers followed by floating point registers...

https://blog.nelhage.com/2010/10/amd64-and-va_arg/

但是保存所有 8 个向量寄存器可能会很慢,因此编译器可能会选择使用传入的值对其进行优化 al

... As an optimization, during a function call, %rax is required to hold the number of SSE registers used to hold arguments, to allow a varargs caller to avoid touching the FPU at all if there are no floating point arguments.

https://blog.nelhage.com/2010/10/amd64-and-va_arg/

由于要保存至少个使用的寄存器,该值可以大于实际使用的寄存器数。这就是 ABI

中有这一行的原因

The contents of %al do not need to match exactly the number of registers, but must be an upper bound on the number of vector registers used and is in the range 0–8 inclusive.

prolog of ICC

可以看出效果
    sub       rsp, 216                                      #5.1
    mov       QWORD PTR [8+rsp], rsi                        #5.1
    mov       QWORD PTR [16+rsp], rdx                       #5.1
    mov       QWORD PTR [24+rsp], rcx                       #5.1
    mov       QWORD PTR [32+rsp], r8                        #5.1
    mov       QWORD PTR [40+rsp], r9                        #5.1
    movzx     r11d, al                                      #5.1
    lea       rax, QWORD PTR [r11*4]                        #5.1
    lea       r11, QWORD PTR ..___tag_value_varstrings(int, ...).6[rip] #5.1
    sub       r11, rax                                      #5.1
    lea       rax, QWORD PTR [175+rsp]                      #5.1
    jmp       r11                                           #5.1
    movaps    XMMWORD PTR [-15+rax], xmm7                   #5.1
    movaps    XMMWORD PTR [-31+rax], xmm6                   #5.1
    movaps    XMMWORD PTR [-47+rax], xmm5                   #5.1
    movaps    XMMWORD PTR [-63+rax], xmm4                   #5.1
    movaps    XMMWORD PTR [-79+rax], xmm3                   #5.1
    movaps    XMMWORD PTR [-95+rax], xmm2                   #5.1
    movaps    XMMWORD PTR [-111+rax], xmm1                  #5.1
    movaps    XMMWORD PTR [-127+rax], xmm0                  #5.1
..___tag_value_varstrings(int, ...).6: 

它本质上是一个 Duff's devicer11寄存器加载xmm保存指令后的地址,然后将结果减去al*4(因为movaps XMMWORD PTR [rax-X], xmmX是4字节长)跳转到movaps 说明我们应该 运行

如我所见,其他编译器总是保存所有向量寄存器,或者根本不保存它们,所以他们不关心al的值,只检查它是否为零

通用寄存器总是被保存,可能是因为将 6 个寄存器移动到内存比花费时间进行条件检查、地址计算和跳转更便宜。因此,您不需要参数来表示在寄存器中传递了多少整数

这是一个similar question to yours。您可以在以下链接中找到更多信息