std::fmod 糟糕的双精度

Question

fmod(1001.0, 0.0001) 给出 0.00009999999995，这似乎是一个非常低的精度（10^-5），给定 [=15= 的预期结果].

根据 cppreference，fmod() 可以使用 remainder() 实现，但是 remainder(1001.0, 0.0001) 给出 -4.796965775988316e-14（离 double 精度还很远，但比 10^-5).

好多了

为什么 fmod 精度如此依赖于输入参数？正常吗？

MCVE:

#include <cmath>
#include <iomanip>
#include <iostream>
using namespace std;

int main() {
    double a = 1001.0, b = 0.0001;
    cout << setprecision(16);
    cout << "fmod:      " << fmod(a, b) << endl;
    cout << "remainder: " << remainder(a, b) << endl;
    cout << "actual:    " << a-floor(a/b)*b << endl;
    cout << "a/b:       " << a / b << endl;
}

输出：

fmod:      9.999999995203035e-05
remainder: -4.796965775988316e-14
actual:    0
a/b:       10010000

（与 GCC、Clang、MSVC 相同的结果，有和没有优化）

Live demo

Answer 1

fmod 产生准确的结果，没有错误。

鉴于 C++ 源代码 fmod(1001.0, 0.0001) 在使用 IEEE-754 binary64（double 最常用的格式）的实现中，源文本 0.0001 被转换为 double 值 0.000100000000000000004792173602385929598312941379845142364501953125.

Then 1001 = 10009999• 0.000100000000000000004792173602385929598312941379845142364501953125 + 0.000099999999952030347032290447106817055100691504776477813720703125, so fmod(1001, 0.0001) is exactly 0.000099999999952030347032290447106817055100691504776477813720703125.

唯一的错误是将源文本中的十进制数字转换为binary-based double格式。 fmod操作没有错误。

Answer 2

如果我们将您的程序修改为：

#include <cmath>
#include <iomanip>
#include <iostream>

int main() {
    double a = 1001.0, b = 0.0001;
    std::cout << std::setprecision(32) << std::left;
    std::cout << std::setw(16) << "a:" << a << "\n"; 
    std::cout << std::setw(16) << "b:" << b << "\n"; 
    std::cout << std::setw(16) << "fmod:" << fmod(a, b) << "\n";
    std::cout << std::setw(16) << "remainder:" << remainder(a, b) << "\n";
    std::cout << std::setw(16) << "floor a/b:" << floor(a/b) << "\n";
    std::cout << std::setw(16) << "actual:" << a-floor(a/b)*b << "\n";
    std::cout << std::setw(16) << "a/b:" << a / b << "\n";
    std::cout << std::setw(16) << "floor 10009999:" << floor(10009999.99999999952) << "\n";
}

它输出：

a:              1001
b:              0.00010000000000000000479217360238593
fmod:           9.9999999952030347032290447106817e-05
remainder:      -4.796965775988315527911254321225e-14
floor a/b:      10010000
actual:         0
a/b:            10010000
floor 10009999: 10010000

我们可以看到 0.0001 不能表示为 double 所以 b 实际上设置为 0.00010000000000000000479217360238593.

这导致 a/b 成为 10009999.9999999995203034224，因此意味着 fmod 应该 return 1001 - 10009999*0.00010000000000000000479217360238593 即 9.99999999520303470323e-5.

（在 speedcrunch 中计算的数字可能与 IEEE 双精度值不完全匹配）

您的“实际”值不同的原因是 floor(a/b) returns 10010000 不是 fmod 使用的确切值 10009999 ，这本身是由于 10009999.99999999952 不能表示为双精度数，因此在传递到 floor 之前四舍五入为 10010000。

Answer 3

这里的基本问题（0.0001 的 IEEE-754 表示）已经 well-established，但为了好玩，我使用 [=14 复制了 fmod 的实现=] 来自 https://en.cppreference.com/w/cpp/numeric/math/fmod 并将其与 std::fmod.

进行比较

#include <iostream>
#include <iomanip>
#include <cmath>

// Possible implementation of std::fmod according to cppreference.com
double fmod2(double x, double y)
{
#pragma STDC FENV_ACCESS ON
    double result = std::remainder(std::fabs(x), (y = std::fabs(y)));
    if (std::signbit(result)) result += y;
    return std::copysign(result, x);
}

int main() {
    // your code goes here
    double b = 0.0001;
    std::cout << std::setprecision(25);
    std::cout << "              b:" << std::setw(35) << b << "\n"; 
    
    double m = 10010000.0;
    double c = m * b;
    double d = 1001.0 - m * b;
    std::cout << std::setprecision(32);
    std::cout << "     10010000*b:" << std::setw(6) << c << "\n"; 
    std::cout << std::setprecision(25);
    std::cout << "1001-10010000*b:" << std::setw(6) << d << "\n";
    
    long double m2 = 10010000.0;
    long double c2 = m2 * b;
    long double d2 = 1001.0 - m2 * b;
    std::cout << std::setprecision(32);
    std::cout << "     10010000*b:" << std::setw(35) << c2 << "\n"; 
    std::cout << std::setprecision(25);
    std::cout << "1001-10010000*b:" << std::setw(35) << d2 << "\n";
    
    std::cout << "      remainder:" << std::setw(35) << std::remainder(1001.0, b) << "\n"; 
    std::cout << "           fmod:" << std::setw(35) << std::fmod(1001.0, b) << "\n"; 
    std::cout << "          fmod2:" << std::setw(35) << fmod2(1001.0, b) << "\n"; 
    std::cout << " fmod-remainder:" << std::setw(35) <<
                 std::fmod(1001.0, b) - std::remainder(1001.0, b) << "\n"; 
    return 0;
}

结果是：

              b:     0.0001000000000000000047921736
     10010000*b:  1001
1001-10010000*b:     0
     10010000*b:  1001.0000000000000479616346638068
1001-10010000*b:    -4.796163466380676254630089e-14
      remainder:    -4.796965775988315527911254e-14
           fmod:     9.999999995203034703229045e-05
          fmod2:     9.999999995203034703229045e-05
 fmod-remainder:     0.0001000000000000000047921736

如最后两行输出所示，实际 std::fmod（至少在此实现中）与 cppreference 页面上建议的实现相匹配，至少对于此示例。

我们还看到 64 位 IEEE-754 的精度不足以表明 10010000 * 0.0001 不同于整数。但是如果我们达到 128 位，小数部分就会清楚地表示出来，当我们从 1001.0 中减去它时，我们发现余数与 std::remainder 的 return 值大致相同。（差异可能是由于 std::remainder 的计算少于 128 位；它可能使用 80 位算术。）

最后，请注意 std::fmod(1001.0, b) - std::remainder(1001.0, b) 结果等于 0.0001 的 64 位 IEEE-754 值。也就是说，这两个函数都是 returning 与相同值模 0.0001000000000000000047921736 一致的结果，但是 std::fmod 选择最小的正值，而 std::remainder 选择最接近零的值。

std::fmod 糟糕的双精度

std::fmod abysmal double precision

c++

floating-point

precision

fmod