如何确定 float, double 的有效位数宽度：是否有标准定义？

Question

在 C 或 C++ 中，是否有标准方式来确定双精度数的尾数宽度？我知道 double 的 IEEE-754 格式以 53 位存储有效数，但我想避免在我的代码中使用“幻数”。

在Linux上，文件usr/include/ieee754.h存在，但它描述了结构中使用位字段的格式，我无法确定（在编译时）的大小。

只有 Linux 的解决方案是可以接受的。

Answer 1

使用FLT_MANT_DIG和DBL_MANT_DIG，定义在<float.h>:

#include <float.h>
#include <stdio.h>


#if FLT_RADIX != 2
    #error "Floating-point base is not two."
#endif


int main(void)
{
    printf("There are %d bits in the significand of a float.\n",
        FLT_MANT_DIG);
    printf("There are %d bits in the significand of a double.\n",
        DBL_MANT_DIG);
}

Answer 2

Is there a standard manner to determine the mantissa of a double?

你愿意接受Linux-specific的解决方案，但是你声称glibc的ieee754.hheader不能满足你的需求，所以我断定你要解决的问题solve 本身并不提取或传送位，因为 header 的 union ieee_double 将为您提供一种方法。

我读 "the mantissa" 与 "the number of bits of mantissa" 不同，所以我得出结论 float.h 的 DBL_MANT_DIG 也不是您要查找的内容。

根据标准浮点模型，我能想到的唯一另一件事可能是尾数（尾数）的值：

v = (sign) * significand * radix^指数

C99 之后的 C 语言标准中的 frexp() 函数就是为了这个目的。¹ 它将 double 分成一个指数（2 ) 和一个有效数字，表示为 double。对于有限的非零输入，结果的绝对值在 half-open 区间 [0.5, 1).

内

例子:

#include <math.h>
#include <stdio.h>

void print_parts(double d) {
    int exp;
    double significand = frexp(d, &exp);

    printf("%e = %f * 2^%d\n", d, significand, exp);
}

样本输出:

7.256300e+16 = 0.503507 * 2^57
1.200000e-03 = 0.614400 * 2^-9
-0.000000e+00 = -0.000000 * 2^0

请注意，虽然示例函数没有打印足够的十进制数字来准确传达有效数字，但 frexp() 本身是准确的，不受任何舍入误差的影响。

¹ 从技术上讲，frexp() 可以达到 的目的，前提是 FLT_RADIX 扩展为 2。它是 well-defined 在任何情况下，但是如果您的 double 表示使用不同的基数，那么 frexp() 的结果，虽然 well-defined，可能不是您要找的。

Answer 3

在 C++ 中，您可以使用 std::numeric_limits<double>::digits 和 std::numeric_limits<float>::digits:

#include <limits>
#include <iostream>

int main()
{
    std::cout << std::numeric_limits<float>::digits << "\n";
    std::cout << std::numeric_limits<double>::digits << "\n";
}

打印

24
53

分别。

如何确定 float, double 的有效位数宽度：是否有标准定义？

How can the significand width in bits of float, double be determined: Is there a standard definition?

c

c++

linux

floating-point

ieee-754