C解释中的浮动下溢

Question

我正在解决处理浮点下溢的 C Primer Plus 练习之一。任务是模拟它。我是这样做的：

#include<stdio.h>
#include<float.h>

int main(void)
{
    // print min value for a positive float retaining full precision
    printf("%s\n %.150f\n", "Minimum positive float value retaining full precision:",FLT_MIN);

    // print min value for a positive float retaining full precision divided by two
    printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by two:",FLT_MIN/2.0);

    // print min value for a positive float retaining full precision divided by four
    printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by four:",FLT_MIN/4.0);

    return 0;
}

结果是

Minimum positive float value retaining full precision:                 0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625000000000000000000000000
Minimum positive float value retaining full precision divided by two:  0.000000000000000000000000000000000000005877471754111437539843682686111228389093327783860437607543758531392086297273635864257812500000000000000000000000
Minimum positive float value retaining full precision divided by four: 0.000000000000000000000000000000000000002938735877055718769921841343055614194546663891930218803771879265696043148636817932128906250000000000000000000000

我预计最小浮点值除以 2 和 4 的精度较低，但似乎精度还可以，并且没有下溢情况。这怎么可能？我错过了什么？

非常感谢

Answer 1

评估精度的方法不正确，因为代码简单地将 FLT_MIN（当然是 2 的幂）除以 2。

而是从一个刚好高于 2 的幂的数字开始，因此它的 二进制 significand 类似于 1.000...(maybe total of 24 binary digits)...0001。确保打印的值最初是 float。（FLT_MIN/2.0 是一个 double。）

请注意，当数字小于 FLT_MIN 时，精度会丢失：最小 标准化 正浮点数。

同时考虑FLT_TRUE_MIN：最小正浮点数。参见 binary32

#include <float.h>
#include <math.h>
#include <stdio.h>

int main(void) {
  char *format = "%.10e %a\n";
  printf(format, FLT_MIN, FLT_MIN);
  printf(format, FLT_TRUE_MIN, FLT_TRUE_MIN);

  float f = nextafterf(1.0f, 2.0f);
  do {
    f /= 2;
    printf(format, f, f);  // print in decimal and hex for detail
  } while (f);
  return 0;
}

输出

1.1754943508e-38 0x1p-126
1.4012984643e-45 0x1p-149

5.0000005960e-01 0x1.000002p-1
2.5000002980e-01 0x1.000002p-2
1.2500001490e-01 0x1.000002p-3
...
2.3509889819e-38 0x1.000002p-125
1.1754944910e-38 0x1.000002p-126
5.8774717541e-39 0x1p-127  // lost least significant bit of precision
2.9387358771e-39 0x1p-128
...
2.8025969286e-45 0x1p-148
1.4012984643e-45 0x1p-149
0.0000000000e+00 0x0p+0

C解释中的浮动下溢

Float underflow in C explanation

c

underflow