为什么eax包含向量参数的个数?
Why does eax contain the number of vector parameters?
为什么al中包含汇编中向量参数的个数?
为什么向量参数与被调用方的普通参数有任何不同?
该值用于优化,如 ABI document
中所述
The prologue should use %al
to avoid unnecessarily saving XMM registers. This is especially important for integer only programs to prevent the initialization of the XMM unit.
3.5.7 Variable Argument Lists - The Register Save Area. System V Application Binary Interface version 1.0
当你调用va_start
时,它会将寄存器中传递的所有参数保存到寄存器保存区
To start, any function that is known to use va_start
is required to, at the start of the function, save all registers that may have been used to pass arguments onto the stack, into the “register save area”, for future access by va_start
and va_arg
. This is an obvious step, and I believe pretty standard on any platform with a register calling convention. The registers are saved as integer registers followed by floating point registers...
但是保存所有 8 个向量寄存器可能会很慢,因此编译器可能会选择使用传入的值对其进行优化 al
... As an optimization, during a function call, %rax
is required to hold the number of SSE registers used to hold arguments, to allow a varargs caller to avoid touching the FPU at all if there are no floating point arguments.
由于要保存至少个使用的寄存器,该值可以大于实际使用的寄存器数。这就是 ABI
中有这一行的原因
The contents of %al
do not need to match exactly the number of registers, but must be an upper bound on the number of vector registers used and is in the range 0–8 inclusive.
可以看出效果
sub rsp, 216 #5.1
mov QWORD PTR [8+rsp], rsi #5.1
mov QWORD PTR [16+rsp], rdx #5.1
mov QWORD PTR [24+rsp], rcx #5.1
mov QWORD PTR [32+rsp], r8 #5.1
mov QWORD PTR [40+rsp], r9 #5.1
movzx r11d, al #5.1
lea rax, QWORD PTR [r11*4] #5.1
lea r11, QWORD PTR ..___tag_value_varstrings(int, ...).6[rip] #5.1
sub r11, rax #5.1
lea rax, QWORD PTR [175+rsp] #5.1
jmp r11 #5.1
movaps XMMWORD PTR [-15+rax], xmm7 #5.1
movaps XMMWORD PTR [-31+rax], xmm6 #5.1
movaps XMMWORD PTR [-47+rax], xmm5 #5.1
movaps XMMWORD PTR [-63+rax], xmm4 #5.1
movaps XMMWORD PTR [-79+rax], xmm3 #5.1
movaps XMMWORD PTR [-95+rax], xmm2 #5.1
movaps XMMWORD PTR [-111+rax], xmm1 #5.1
movaps XMMWORD PTR [-127+rax], xmm0 #5.1
..___tag_value_varstrings(int, ...).6:
它本质上是一个 Duff's device。 r11
寄存器加载xmm保存指令后的地址,然后将结果减去al*4
(因为movaps XMMWORD PTR [rax-X], xmmX
是4字节长)跳转到movaps
说明我们应该 运行
如我所见,其他编译器总是保存所有向量寄存器,或者根本不保存它们,所以他们不关心al
的值,只检查它是否为零
通用寄存器总是被保存,可能是因为将 6 个寄存器移动到内存比花费时间进行条件检查、地址计算和跳转更便宜。因此,您不需要参数来表示在寄存器中传递了多少整数
这是一个similar question to yours。您可以在以下链接中找到更多信息
- How do vararg functions find out the number of arguments in machine code?
- Why is %eax zeroed before a call to printf?
- Identifying variable args function
为什么al中包含汇编中向量参数的个数?
为什么向量参数与被调用方的普通参数有任何不同?
该值用于优化,如 ABI document
中所述The prologue should use
%al
to avoid unnecessarily saving XMM registers. This is especially important for integer only programs to prevent the initialization of the XMM unit.3.5.7 Variable Argument Lists - The Register Save Area. System V Application Binary Interface version 1.0
当你调用va_start
时,它会将寄存器中传递的所有参数保存到寄存器保存区
To start, any function that is known to use
va_start
is required to, at the start of the function, save all registers that may have been used to pass arguments onto the stack, into the “register save area”, for future access byva_start
andva_arg
. This is an obvious step, and I believe pretty standard on any platform with a register calling convention. The registers are saved as integer registers followed by floating point registers...
但是保存所有 8 个向量寄存器可能会很慢,因此编译器可能会选择使用传入的值对其进行优化 al
... As an optimization, during a function call,
%rax
is required to hold the number of SSE registers used to hold arguments, to allow a varargs caller to avoid touching the FPU at all if there are no floating point arguments.
由于要保存至少个使用的寄存器,该值可以大于实际使用的寄存器数。这就是 ABI
中有这一行的原因可以看出效果The contents of
%al
do not need to match exactly the number of registers, but must be an upper bound on the number of vector registers used and is in the range 0–8 inclusive.
sub rsp, 216 #5.1
mov QWORD PTR [8+rsp], rsi #5.1
mov QWORD PTR [16+rsp], rdx #5.1
mov QWORD PTR [24+rsp], rcx #5.1
mov QWORD PTR [32+rsp], r8 #5.1
mov QWORD PTR [40+rsp], r9 #5.1
movzx r11d, al #5.1
lea rax, QWORD PTR [r11*4] #5.1
lea r11, QWORD PTR ..___tag_value_varstrings(int, ...).6[rip] #5.1
sub r11, rax #5.1
lea rax, QWORD PTR [175+rsp] #5.1
jmp r11 #5.1
movaps XMMWORD PTR [-15+rax], xmm7 #5.1
movaps XMMWORD PTR [-31+rax], xmm6 #5.1
movaps XMMWORD PTR [-47+rax], xmm5 #5.1
movaps XMMWORD PTR [-63+rax], xmm4 #5.1
movaps XMMWORD PTR [-79+rax], xmm3 #5.1
movaps XMMWORD PTR [-95+rax], xmm2 #5.1
movaps XMMWORD PTR [-111+rax], xmm1 #5.1
movaps XMMWORD PTR [-127+rax], xmm0 #5.1
..___tag_value_varstrings(int, ...).6:
它本质上是一个 Duff's device。 r11
寄存器加载xmm保存指令后的地址,然后将结果减去al*4
(因为movaps XMMWORD PTR [rax-X], xmmX
是4字节长)跳转到movaps
说明我们应该 运行
如我所见,其他编译器总是保存所有向量寄存器,或者根本不保存它们,所以他们不关心al
的值,只检查它是否为零
通用寄存器总是被保存,可能是因为将 6 个寄存器移动到内存比花费时间进行条件检查、地址计算和跳转更便宜。因此,您不需要参数来表示在寄存器中传递了多少整数
这是一个similar question to yours。您可以在以下链接中找到更多信息
- How do vararg functions find out the number of arguments in machine code?
- Why is %eax zeroed before a call to printf?
- Identifying variable args function