将浮点除法分解为整数和小数部分

Question

我正在尝试使用双精度进行整数除法 + 模运算（用于基于样条的插值），但在使用 std::floor 和 [=13= 时遇到了一些与浮点精度相关的问题].

我一直在使用下面的 div1 的等价物，但在 50 时它产生了不正确的结果（也就是说，整数部分是 3，但模数部分是除数减去 epsilon）。 div2 有效，但相当复杂。 div3 至少是一致的，但不是 return 我想要的结果类型（余数可能是负数，所以在我使用它之前需要进一步的操作）。

#include <iostream>
#include <cmath>

std::pair<double, double> div1(int num, double denom){
    double whole = std::floor(num / denom);
    double remain = std::fmod(num, denom);
    return {whole, remain};
}
std::pair<double, double> div2(int num, double denom){
    double floatdiv = num / denom;
    double whole;
    double remain = std::modf(floatdiv, &whole);
    return {whole, remain * denom};
}
std::pair<double, double> div3(int num, double denom){
    double whole = std::round(num / denom);
    double remain = std::remainder(num, denom);
    return {whole, remain};
}
int main() {
    double denom = 100.0 / 6;
    int divtype = 0;
    for(auto div: {div1, div2, div3}){
        std::cerr << "== Using div" << ++divtype << " for this run ==\n";
        for(int i = 40; i <= 60; ++i){
            auto res = div(i, denom);
            std::cerr << i << ": " << res.first << ", " << res.second << " = " << res.first * denom + res.second << "\n";
        }
        auto oldprec = std::cerr.precision(64);
        auto res = div(50, denom);
        std::cerr << 50 << ": " << res.first << ", " << res.second << " = " << res.first << " * " << denom << " + " << res.second << " = " << std::floor(res.first) * denom + res.second << "\n";
        std::cerr.precision(oldprec);
        std::cerr << "\n";
    }
    return 0;
}

https://ideone.com/9UbHcE

对于50的情况，产生以下结果：

 - div1: 3, 16.6666... 
 - div2: 3, 0
 - div3: 3, -3e-15

是我做错了什么，还是std::floor(num / denom)+std::fmod(num, denom)不靠谱？如果是这样，什么是好的替代品？ div2 是最佳选择吗？

包含最多答案的代码示例版本： https://ideone.com/l2wGRj

Answer 1

问题不在于您的 fmod，而在于您对 floor 的输入。由于 floating-point 精度模糊，fmod 到 return 接近分母的值很好。问题是您必须小心使用与余数相同的规则来处理商，以便结果出来（使用模糊相等）：

x/y == (quot, rem) == quot * y + rem

为了说明，我添加了 div4 和 div5:

std::pair<double, double> div4( int num,  double denom){
    int quo;
    auto rem = std::remquo(num, denom, &quo );
    return {quo, rem};
}

std::pair<double, double> div5( int num,  double denom){
    auto whole = std::floor(num / static_cast<long double>( denom ) );
    auto remain = std::fmod(num, denom);
    return {whole, remain};
}

这里是 slimmed-down version of your code 只关注失败案例。输出为：

div1: 50 / 16.6666666666666678509 = (whole, remain) = (3, 16.6666666666666642982) = 66.6666666666666571928
...
div4: 50 / 16.6666666666666678509 = (whole, remain) = (3, -3.55271367880050092936e-15) = 50
div5: 50 / 16.6666666666666678509 = (whole, remain) = (2, 16.6666666666666642982) = 50

对于 div1，您得到了 3 的整数和（几乎）一个除数的余数。错误是由于 floating-point 的模糊性，发送到 floor 的值正好在线上，因此向上增加到 3，而实际上应该是 2。

如果你使用我的 div5，它使用 std::remquo 同时计算余数和商，你会得到相似的对 (2, ~divisor) 然后所有乘法正确回到 50。（请注意，商以整数形式返回，而不是来自该标准函数的 floating-point 数字。）[更新：如评论中所述，这是只对商的3位精度有效，也就是说对需要检测象限或八分圆的周期函数有用，但对一般商无效。]

或者，如果您使用我的 div4，我使用您的 div1 逻辑，但在除法运算之前将输入从 floor 升级到 long double 精度，这给了它足够的数字来正确评估地板。结果是 (3, ~0)，它显示了余数而不是商的模糊性。

long double 方法最终只是以更高的精度将罐子踢到同一问题的道路上。对于周期函数的有限情况，使用 std::remquo 在数值上更可靠。选择哪个版本取决于你更关心的是：数值计算还是美观显示。

更新：您还可以尝试使用 FP 异常来检测何时出现问题：

void printError()
{
    if( std::fetestexcept(FE_DIVBYZERO)  ) std::cout << "pole error occurred in an earlier floating-point operation\n";
    if( std::fetestexcept(FE_INEXACT)    ) std::cout << "inexact result: rounding was necessary to store the result of an earlier floating-point operation\n";
    if( std::fetestexcept(FE_INVALID)    ) std::cout << "domain error occurred in an earlier floating-point operation\n";
    if( std::fetestexcept(FE_OVERFLOW)   ) std::cout << "the result of the earlier floating-point operation was too large to be representable\n";
    if( std::fetestexcept(FE_UNDERFLOW)  ) std::cout << "the result of the earlier floating-point operation was subnormal with a loss of precision\n";
}

// ...
// Calling code
std::feclearexcept(FE_ALL_EXCEPT);
const auto res = div(i, denom);
printError();
// ...

此报告 inexact result: rounding was necessary to store the result of an earlier floating-point operation 函数 1、2、3 和 5。在 Coliru.

上实时查看

Answer 2

您的核心问题是 denom = 100.0/6 与数学上的精确值 denomMath = 100/6 = 50/3 不同，因为它不能表示为 2 的幂之和。我们可以写 denom = denomMath + eps（带有一个小的正或负 epsilon）。赋值后，denom与最接近的浮点数无异！如果您现在尝试将某个值 denomMath * k = denom * k + eps * k 除以 denom，对于足够大的 k，您将得到错误的结果数学（即准确算术）已经 - 在这种情况下你没有希望。这种情况发生的时间取决于所涉及的幅度（如果值 < 1，那么你的所有 div 将产生零的全部部分并且是准确的，而对于大于 2^54 的值，你甚至不能表示奇数).

但即使在那之前，也不能保证 denomMath 除以 denom 的（数学）倍数会产生可以 floored 或 fmod 的结果编到正确的整数。四舍五入可能会让你暂时安全，但如上所示，前提是误差不会太大。

所以：

div1 遇到此处描述的问题：https://en.cppreference.com/w/cpp/numeric/math/fmod

The expression x - trunc(x/y)*y may not equal fmod(x,y) when the rounding of x/y to initialize the argument of trunc loses too much precision (example: x = 30.508474576271183309, y = 6.1016949152542370172)

在您的情况下，50 / denom 产生的数字与确切结果（3 - some epsilon 相比略大（3），因为 denom 略大于denomMath)

您不能依赖 std::floor(num / denom) + std::fmod(num, denom) 等于 num。
div2 存在上述问题：在您的情况下它有效，但如果您尝试更多情况，您会发现 num / denom 稍微太小而不是太大，它也会失败。
div3 有上述承诺。它实际上为您提供了您所希望的最精确的结果。

Answer 3

这感觉它可能是 fmod() 实现中的错误。根据 cppreference.com 上 std::fmod 的定义：

The floating-point remainder of the division operation x/y calculated by this function is exactly the value x - n*y, where n is x/y with its fractional part truncated

因此我补充说：

std::pair<double, double> div4(int num, double denom){
    double whole = std::floor(num / denom);
    int n = trunc(num / denom) ;
    double remain = num - n * denom ;
    return {whole, remain};
}

并查看 div1 和 div4 从 49 到 51 的输出，我得到 ¹:

== Using div1 for this run ==
49: 2, 15.6667 = 49
50: 3, 16.6667 = 66.6667
51: 3, 1 = 51
50: 3, 16.66666666666666429819088079966604709625244140625 = 3 * 16.666666666666667850904559600166976451873779296875 + 16.66666666666666429819088079966604709625244140625 = 66.666666666666657192763523198664188385009765625

== Using div4 for this run ==
49: 2, 15.6667 = 49
50: 3, 0 = 50
51: 3, 1 = 51
50: 3, 0 = 3 * 16.666666666666667850904559600166976451873779296875 + 0 = 50

这确实给出了预期的结果。

¹ 因为这是我手上所有的东西，上面的输出是由运行ning 原始文件生成的通过 Emscripten 编码，它使用 clang 将代码转换为 JavaScript，然后是运行和 node.js。由于这与原始代码产生了相同的 "problem"，如果编译为本机代码，我 expect/hope 我的 div4 也会做同样的事情。

Answer 4

对于正的分子和分母，只要商不超过floating-point格式的有效位数（2⁵³ 典型 double):

std::pair<double, double> divPerfect(int num, double denom)
{
    double whole = std::floor(num / denom);
    double remain = std::fma(whole, -denom, num);
    if (remain < 0)
    {
        --whole;
        remain = std::fma(whole, -denom, num);
    }
    return {whole, remain};
}

推理：

如果确切的num / denom是一个整数，它是可表示的，num / denom必须产生它，std::floor(num / denom)具有相同的值。否则，num / denom 可能会稍微向上或向下舍入，这不能减少 std::floor(num / denom) 但可能会增加 1。因此，double whole = std::floor(num / denom) 为我们提供了适当的商数或更多。
如果 whole 是正确的，那么 std::fma(whole, -denom, num) 就是精确的，因为精确的数学答案的量级小于 denom 或等于 whole（如果 whole < denom)，并且它的最低有效位至少与 denom 或 whole 中的可用最低有效位位置一样大，因此它的所有位都适合floating-point 格式，因此它是可表示的，因此必须作为结果生成。此外，这个余数是 non-negative.
如果 whole 太高，则 std::fma(whole, -denom, num) 为负数（但仍然准确，如上所述）。然后我们更正 whole 并重复 std::fma 以获得准确的结果。

我希望可以避免第二个 std::fma:

std::pair<double, double> divPerfect(int num, double denom)
{
    double whole = std::floor(num / denom);
    double remain = std::fma(whole, -denom, num);
    return 0 <= remain ? std::pair(whole, remain) : std::pair(whole - 1, remain + denom);
}

但是，我想再考虑一下才能确定。

将浮点除法分解为整数和小数部分

Decomposing floating-point division into integer and fractional part

c++

division

modulo

floating-accuracy