使用 double 的整数 sqrt 的精度

Question

我想计算 uint64_t 的整数部分。对于 32 位 uint32_t，通常建议先将其转换为 double、sqrt，然后再将其转换回 uint32_t。

考虑到 double 最多只能容纳 2^53 个数字，它是否也适用于 uint64_t？即，以下是否总是会给出正确答案：

#include <math.h>
uint64_t x = ...;
uint64_t result = (uint64_t)sqrt((double)x);

甚至：

#include <math.h>
uint64_t x = ...;
uint32_t result = (uint32_t)sqrt((double)x);

Answer 1

根据经验，答案是否。输入 4503599761588224 的结果被错误地计算为 67108865 而不是 67108864。

下面的代码标识了这种情况。¹当然，您可以去掉break;以观察其他情况。

#include <stdio.h>
#include <stdint.h>
#include <math.h>

int main(void) {
    for (uint32_t y = 1; y != 0; y++) {
        // *Just* smaller than a perfect square
        uint64_t x = ((uint64_t)y * (uint64_t)y) - 1;

        // We expect the floor of the result     
        uint32_t expected = y - 1;

        uint32_t result = (uint32_t)sqrt((double)x);

        if (result != expected) {
            printf("Incorrect: x = %llu, result = %u\n", x, result);
            break;
        }
    }
    return 0;
}

值 4503599761588224 有什么特别之处？嗯，正好是 (2²⁶ + 1)² - 1，也就是 (2⁵² + 2 ²⁷)。这可以用 double 精确表示，因此错误不是由于 long -> double 转换造成的。

相反，错误是 sqrt 实现的内部错误。此处的增量（相对于完美平方）将平方根减少了大约 2^-27，这比 [= 小了大约 2⁵³ 倍16=] 本身。这是双精度可以处理的极限，因此我们自然希望此时会出现差一错误。²

_{1. Live demo.}

_{2。感谢@EricPostpischil 在下面的评论中确定了根本原因:)}

使用 double 的整数 sqrt 的精度

Accuracy of integer sqrt using double

floating-point

square-root