trunc 函数是否很慢？

Question

tl;dr: double b=a-(size_t)(a) 比 double b=a-trunc(a)

快

我正在为图像实现旋转功能，我注意到 trunc 功能似乎非常慢。

图像的循环代码，像素的实际影响在性能测试中被注释掉了，所以我什至不访问像素。

double sina(sin(angle)), cosa(cos(angle));
int h = (int) (_in->h*cosa + _in->w*sina);
int w = (int) (_in->w*cosa + _in->h*sina);
int offsetx = (int)(_in->h*sina);

SDL_Surface* out = SDL_CreateARGBSurface(w, h); //wrapper over SDL_CreateRGBSurface

SDL_FillRect(out, NULL, 0x0);//transparent black
for (int y = 0; y < _in->h; y++)
    for (int x = 0; x < _in->w; x++){
            //calculate the new position
            const double destY = y*cosa + x*sina;
            const double destX = x*cosa - y*sina + offsetx;

所以这是使用 trunc

的代码

size_t tDestX = (size_t) trunc(destX);
size_t tDestY = (size_t) trunc(destY);
double left = destX - trunc(destX);
double top = destY - trunc(destY);

这是更快的等价物

size_t tDestX = (size_t)(destX);
size_t tDestY = (size_t)(destY);
double left = destX - tDestX;
double top = destY - tDestY;

答案建议在转换回整数时不要使用 trunc 所以我也尝试了这种情况：

size_t tDestX = (size_t) (destX);
size_t tDestY = (size_t) (destY);
double left = destX - trunc(destX);
double top = destY - trunc(destY);

快速版本似乎平均需要 30 毫秒来浏览完整图像 (2048x1200)，而使用 trunc 的慢速版本处理同一图像大约需要 135 毫秒。只有两次调用 trunc 的版本仍然比没有调用的版本慢得多（大约 100 毫秒）。

据我了解 C++ 规则，两个表达式应该 return 始终是同一件事。我在这里错过了什么吗？ dextX 和 destY 被声明为 const 因此只应调用一次 trunc 函数，即使那样它本身也不能解释超过三倍的慢因素。

我正在使用 Visual Studio 2013 进行优化编译 (/O2)。是否有任何理由使用 trunc 函数？即使使用整数来获取小数部分似乎也更快。

Answer 1

按照您的使用方式，您根本没有理由使用 trunc function。它将双精度转换为双精度，然后将其转换为积分并丢弃。替代方案更快这一事实并不令人惊讶。

Answer 2

在现代 x86 CPUs 上，int <-> float 转换非常快 - 通常会为转换生成内联 SSE 代码，成本约为几个指令周期。¹

但是 trunc 需要函数调用，而且函数调用本身的开销几乎肯定大于内联 float -> int 转换的开销。此外，trunc 函数本身可能成本相对较高——它必须完全符合 IEEE-754，因此必须正确处理整个范围的浮点值，如 NaN、INF、 denorms，超出范围的值等。所以总的来说，我预计 trunc 的成本约为数十个指令周期，即比内联浮点数的成本大一个数量级左右-> 整数转换。

1. 请注意，float <-> int 转换并不总是很便宜 - 其他 CPU 系列，甚至更早的 x86 CPUs，可能没有 ISA 支持此类转换，在这种情况下，库函数将通常使用，成本与 trunc 相似。现代 x86 CPUs 是这方面的特例。

trunc 函数是否很慢？

Is the trunc function very slow?

c++

performance

truncation

visual-studio-2013