在 C++ 中,为什么有些编译器拒绝将仅由双精度组成的对象放入寄存器中?
In C++, why do some compilers refuse to put objects consisting of only a double into a register?
在 Scott Meyer Effective C++ 的第 20 节中,他指出:
some compilers refuse to put objects consisting of only a double into a register
当按值传递内置类型时,编译器会很乐意将数据放入寄存器并快速发送 ints
/doubles
/floats
/等。沿着。然而,并不是所有的编译器都会以同样的优雅对待小对象。我很容易理解为什么编译器会以不同的方式对待对象——按值传递对象比在 vtable 和所有构造函数之间复制数据成员要多得多。
但还是。对于 现代 编译器来说,这似乎是一个容易解决的问题:"This class is small, maybe I can treat it differently"。 Meyer 的声明似乎暗示编译器将对仅包含 int
(或 char
或 short
)的对象进行此优化。
有人可以进一步了解为什么有时不会进行这种优化吗?
这是一个示例,显示具有优化级别 O3
的 LLVM clang 将 class 与单个双精度数据成员视为双精度数据成员:
$ cat main.cpp
#include <stdio.h>
class MyDouble {
public:
double d;
MyDouble(double _d):d(_d){}
};
void foo(MyDouble d)
{
printf("%lg\n",d.d);
}
int main(int argc, char **argv)
{
if (argc>5)
{
double x=(double)argc;
MyDouble d(x);
foo(d);
}
return 0;
}
当我编译它并查看生成的位码文件时,我看到 foo 的行为
就像它对 double
类型的输入参数进行操作一样:
$ clang++ -O3 -c -emit-llvm main.cpp
$ llvm-dis main.bc
相关部分如下:
; Function Attrs: nounwind uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
%call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i64 0, i64 0), double %d.coerce)
ret void
}
查看 foo
如何将其输入参数声明为 double
,并将其四处移动
按原样打印。现在让我们用 O0
:
编译完全相同的代码
$ clang++ -O0 -c -emit-llvm main.cpp
$ llvm-dis main.bc
当我们查看相关部分时,我们看到 clang 使用 getelementptr 指令访问其第一个(也是唯一的)数据成员 d
:
; Function Attrs: uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
%d = alloca %class.MyDouble, align 8
%coerce.dive = getelementptr %class.MyDouble* %d, i32 0, i32 0
store double %d.coerce, double* %coerce.dive, align 1
%d1 = getelementptr inbounds %class.MyDouble* %d, i32 0, i32 0
%0 = load double* %d1, align 8
%call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %0)
ret void
}
我在“Calling conventions for different C++ compilers and operating systems”(更新于 2018-04-25)上在线找到了这篇文档,其中 table 描述了 "Methods for passing structure, class and union objects"。
从 table 可以看出,如果一个对象包含 long double
,整个对象的副本将传输到此处显示的所有编译器的堆栈。
也来自同一资源(添加了重点):
There are several different methods to transfer a parameter to a function if the parameter is a structure, class or union object. A copy of the object is always made, and this copy is transferred to the called function either in registers, on the stack, or by a pointer, as specified in table 6. The symbols in the table specify which method to use. S takes precedence over I and R. PI and PS take precedence over all other passing methods.
As table 6 tells, an object cannot be transferred in registers if it is too big or too complex. For example, an object that has a copy constructor cannot be transferred in registers because the copy constructor needs an address of the object. The copy constructor is called by the caller, not the callee.
Objects passed on the stack are aligned by the stack word size, even if higher alignment would be desired. Objects passed by pointers are not aligned by any of the compilers studied, even if alignment is explicitly requested. The 64bit Windows ABI requires that objects passed by pointers be aligned by 16.
An array is not treated as an object but as a pointer, and no copy of the array is made, except if the array is wrapped into a structure, class or union.
The 64 bit compilers for Linux differ from the ABI (version 0.97) in the following respects: Objects with inheritance, member functions, or constructors can be passed in registers. Objects with copy constructor, destructor or virtual are passed by pointers rather than on the stack.
The Intel compilers for Windows are compatible with Microsoft. Intel compilers for Linux are compatible with Gnu.
在 Scott Meyer Effective C++ 的第 20 节中,他指出:
some compilers refuse to put objects consisting of only a double into a register
当按值传递内置类型时,编译器会很乐意将数据放入寄存器并快速发送 ints
/doubles
/floats
/等。沿着。然而,并不是所有的编译器都会以同样的优雅对待小对象。我很容易理解为什么编译器会以不同的方式对待对象——按值传递对象比在 vtable 和所有构造函数之间复制数据成员要多得多。
但还是。对于 现代 编译器来说,这似乎是一个容易解决的问题:"This class is small, maybe I can treat it differently"。 Meyer 的声明似乎暗示编译器将对仅包含 int
(或 char
或 short
)的对象进行此优化。
有人可以进一步了解为什么有时不会进行这种优化吗?
这是一个示例,显示具有优化级别 O3
的 LLVM clang 将 class 与单个双精度数据成员视为双精度数据成员:
$ cat main.cpp
#include <stdio.h>
class MyDouble {
public:
double d;
MyDouble(double _d):d(_d){}
};
void foo(MyDouble d)
{
printf("%lg\n",d.d);
}
int main(int argc, char **argv)
{
if (argc>5)
{
double x=(double)argc;
MyDouble d(x);
foo(d);
}
return 0;
}
当我编译它并查看生成的位码文件时,我看到 foo 的行为
就像它对 double
类型的输入参数进行操作一样:
$ clang++ -O3 -c -emit-llvm main.cpp
$ llvm-dis main.bc
相关部分如下:
; Function Attrs: nounwind uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
%call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i64 0, i64 0), double %d.coerce)
ret void
}
查看 foo
如何将其输入参数声明为 double
,并将其四处移动
按原样打印。现在让我们用 O0
:
$ clang++ -O0 -c -emit-llvm main.cpp
$ llvm-dis main.bc
当我们查看相关部分时,我们看到 clang 使用 getelementptr 指令访问其第一个(也是唯一的)数据成员 d
:
; Function Attrs: uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
%d = alloca %class.MyDouble, align 8
%coerce.dive = getelementptr %class.MyDouble* %d, i32 0, i32 0
store double %d.coerce, double* %coerce.dive, align 1
%d1 = getelementptr inbounds %class.MyDouble* %d, i32 0, i32 0
%0 = load double* %d1, align 8
%call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %0)
ret void
}
我在“Calling conventions for different C++ compilers and operating systems”(更新于 2018-04-25)上在线找到了这篇文档,其中 table 描述了 "Methods for passing structure, class and union objects"。
从 table 可以看出,如果一个对象包含 long double
,整个对象的副本将传输到此处显示的所有编译器的堆栈。
也来自同一资源(添加了重点):
There are several different methods to transfer a parameter to a function if the parameter is a structure, class or union object. A copy of the object is always made, and this copy is transferred to the called function either in registers, on the stack, or by a pointer, as specified in table 6. The symbols in the table specify which method to use. S takes precedence over I and R. PI and PS take precedence over all other passing methods.
As table 6 tells, an object cannot be transferred in registers if it is too big or too complex. For example, an object that has a copy constructor cannot be transferred in registers because the copy constructor needs an address of the object. The copy constructor is called by the caller, not the callee.
Objects passed on the stack are aligned by the stack word size, even if higher alignment would be desired. Objects passed by pointers are not aligned by any of the compilers studied, even if alignment is explicitly requested. The 64bit Windows ABI requires that objects passed by pointers be aligned by 16.
An array is not treated as an object but as a pointer, and no copy of the array is made, except if the array is wrapped into a structure, class or union.
The 64 bit compilers for Linux differ from the ABI (version 0.97) in the following respects: Objects with inheritance, member functions, or constructors can be passed in registers. Objects with copy constructor, destructor or virtual are passed by pointers rather than on the stack.
The Intel compilers for Windows are compatible with Microsoft. Intel compilers for Linux are compatible with Gnu.