IEEE-754 双精度和拆分方法

Question

当你计算初等函数时，你应用常数修改。特别是在 exp(x) 的实现中。在所有这些实现中，对 ln(2) 的任何校正都分两步完成。 ln(2) 分为两个数字：

static const double ln2p1   = 0.693145751953125;
static const double ln2p2   = 1.42860682030941723212E-6;
// then ln(2) = ln2p1 + ln2p2

然后用 ln(2) 进行的任何计算都是通过以下方式完成的：

 blablabla -= ln2p1
 blablabla -= ln2p2

我知道是为了避免四舍五入的影响。可是为什么这两个号特地在里面？你们中有些人知道如何得到这两个数字？

谢谢！

在第一条评论之后，我用更多 material 和一个非常奇怪的问题完成了这个 post。我和我的团队一起工作，我们同意通过将数字 ln(2) 分成两个数字来将精度加倍。为此，应用了两个转换，第一个：

1) c_h = floor(2^k ln(2))/2^k
2) c_l = ln(2) - c_h

k 表示精度，在 Cephes 库（~1980）中，float k 固定为 9，double 固定为 16，long long double 固定为 16（我不知道为什么）。所以对于 double c_h 有 16 位的精度，但是对于 c_l.

有 52 位的精度

据此，我编写了以下程序，并以 52 位精度确定 c_h。

 #include <iostream>
 #include <math.h>
 #include <iomanip>

 enum precision { nine = 9, sixteen = 16, fiftytwo = 52 };

 int64_t k_helper(double x){
     return floor(x/log(2));
 }

 template<class C>
 double z_helper(double x, int64_t k){
     x -= k*C::c_h;
     x -= k*C::c_l;
     return x;
 }

 template<precision p>
 struct coeff{};

 template<>
 struct coeff<nine>{
     constexpr const static double c_h = 0.693359375;
     constexpr const static double c_l = -2.12194440e-4;
 };

 template<>
 struct coeff<sixteen>{
     constexpr const static double c_h = 6.93145751953125E-1;
     constexpr const static double c_l = 1.42860682030941723212E-6;
 };

 template<>
 struct coeff<fiftytwo>{
     constexpr const static double c_h = 0.6931471805599453972490664455108344554901123046875;
     constexpr const static double c_l = -8.78318343240526578874146121703272447458793199905066E-17;
 };


 int main(int argc, const char * argv[]) {

    double x = atof(argv[1]);
    int64_t k = k_helper(x);

    double z_9  = z_helper<coeff<nine> >(x,k);
    double z_16 = z_helper<coeff<sixteen> >(x,k);
    double z_52 = z_helper<coeff<fiftytwo> >(x,k);


    std::cout << std::setprecision(16) << " 9  bits precisions " << z_9 << "\n"
                                       << " 16 bits precisions " << z_16 << "\n"
                                       << " 52 bits precisions " << z_52 << "\n";



    return 0;

}

如果我现在计算一组不同的值，我得到：

bash-3.2$ g++ -std=c++11 main.cpp  
bash-3.2$ ./a.out 1
9  bits precisions 0.30685281944
16 bits precisions 0.3068528194400547
52 bits precisions 0.3068528194400547
bash-3.2$ ./a.out 2
9  bits precisions 0.61370563888
16 bits precisions 0.6137056388801094
52 bits precisions 0.6137056388801094
bash-3.2$ ./a.out 100
9  bits precisions 0.18680599936
16 bits precisions 0.1868059993678755
52 bits precisions 0.1868059993678755
bash-3.2$ ./a.out 200
9  bits precisions 0.37361199872
16 bits precisions 0.3736119987357509
52 bits precisions 0.3736119987357509
bash-3.2$ ./a.out 300
9  bits precisions 0.56041799808
16 bits precisions 0.5604179981036264
52 bits precisions 0.5604179981036548
bash-3.2$ ./a.out 400
9  bits precisions 0.05407681688
16 bits precisions 0.05407681691155647
52 bits precisions 0.05407681691155469
bash-3.2$ ./a.out 500
9  bits precisions 0.24088281624
16 bits precisions 0.2408828162794319
52 bits precisions 0.2408828162794586
bash-3.2$ ./a.out 600
9  bits precisions 0.4276888156
16 bits precisions 0.4276888156473074
52 bits precisions 0.4276888156473056
bash-3.2$ ./a.out 700
9  bits precisions 0.61449481496
16 bits precisions 0.6144948150151828
52 bits precisions 0.6144948150151526

就像当 x 变得大于 300 时会出现差异。我看了一下 gnulibc

的实现

http://osxr.org:8080/glibc/source/sysdeps/ieee754/ldbl-128/s_expm1l.c

目前它正在使用 c_h（第 84 行）

的 16 位预置

好吧，我可能遗漏了一些符合 IEEE 标准的东西，而且我无法想象 glibc 中的精度错误。你怎么看？

最佳，

Answer 1

ln2p1 正好是 45426/65536。这可以通过round(65536 * ln(2))获得。 ln2p2 就是余数。那么这两个数的特别之处在于分母65536(2¹⁶).

据我发现，大多数使用该常量的算法都可以追溯到 cephes 库，该库于 1984 年首次发布，当时 16 位计算仍占主导地位，这可能解释了为什么 2¹⁶被选中。

IEEE-754 双精度和拆分方法

IEEE-754 double precision and splitting method

precision

inline-assembly

ieee-754

exp

elementary-functions