C解释中的浮动下溢
Float underflow in C explanation
我正在解决处理浮点下溢的 C Primer Plus 练习之一。任务是模拟它。我是这样做的:
#include<stdio.h>
#include<float.h>
int main(void)
{
// print min value for a positive float retaining full precision
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision:",FLT_MIN);
// print min value for a positive float retaining full precision divided by two
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by two:",FLT_MIN/2.0);
// print min value for a positive float retaining full precision divided by four
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by four:",FLT_MIN/4.0);
return 0;
}
结果是
Minimum positive float value retaining full precision: 0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625000000000000000000000000
Minimum positive float value retaining full precision divided by two: 0.000000000000000000000000000000000000005877471754111437539843682686111228389093327783860437607543758531392086297273635864257812500000000000000000000000
Minimum positive float value retaining full precision divided by four: 0.000000000000000000000000000000000000002938735877055718769921841343055614194546663891930218803771879265696043148636817932128906250000000000000000000000
我预计最小浮点值除以 2 和 4 的精度较低,但似乎精度还可以,并且没有下溢情况。这怎么可能?我错过了什么?
非常感谢
评估精度的方法不正确,因为代码简单地将 FLT_MIN
(当然是 2 的幂)除以 2。
而是从一个刚好高于 2 的幂的数字开始,因此它的 二进制 significand 类似于 1.000...(maybe total of 24 binary digits)...0001
。确保打印的值最初是 float
。 (FLT_MIN/2.0
是一个 double
。)
请注意,当数字小于 FLT_MIN
时,精度会丢失:最小 标准化 正浮点数。
同时考虑FLT_TRUE_MIN
:最小正浮点数。参见 binary32
#include <float.h>
#include <math.h>
#include <stdio.h>
int main(void) {
char *format = "%.10e %a\n";
printf(format, FLT_MIN, FLT_MIN);
printf(format, FLT_TRUE_MIN, FLT_TRUE_MIN);
float f = nextafterf(1.0f, 2.0f);
do {
f /= 2;
printf(format, f, f); // print in decimal and hex for detail
} while (f);
return 0;
}
输出
1.1754943508e-38 0x1p-126
1.4012984643e-45 0x1p-149
5.0000005960e-01 0x1.000002p-1
2.5000002980e-01 0x1.000002p-2
1.2500001490e-01 0x1.000002p-3
...
2.3509889819e-38 0x1.000002p-125
1.1754944910e-38 0x1.000002p-126
5.8774717541e-39 0x1p-127 // lost least significant bit of precision
2.9387358771e-39 0x1p-128
...
2.8025969286e-45 0x1p-148
1.4012984643e-45 0x1p-149
0.0000000000e+00 0x0p+0
我正在解决处理浮点下溢的 C Primer Plus 练习之一。任务是模拟它。我是这样做的:
#include<stdio.h>
#include<float.h>
int main(void)
{
// print min value for a positive float retaining full precision
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision:",FLT_MIN);
// print min value for a positive float retaining full precision divided by two
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by two:",FLT_MIN/2.0);
// print min value for a positive float retaining full precision divided by four
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by four:",FLT_MIN/4.0);
return 0;
}
结果是
Minimum positive float value retaining full precision: 0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625000000000000000000000000
Minimum positive float value retaining full precision divided by two: 0.000000000000000000000000000000000000005877471754111437539843682686111228389093327783860437607543758531392086297273635864257812500000000000000000000000
Minimum positive float value retaining full precision divided by four: 0.000000000000000000000000000000000000002938735877055718769921841343055614194546663891930218803771879265696043148636817932128906250000000000000000000000
我预计最小浮点值除以 2 和 4 的精度较低,但似乎精度还可以,并且没有下溢情况。这怎么可能?我错过了什么?
非常感谢
评估精度的方法不正确,因为代码简单地将 FLT_MIN
(当然是 2 的幂)除以 2。
而是从一个刚好高于 2 的幂的数字开始,因此它的 二进制 significand 类似于 1.000...(maybe total of 24 binary digits)...0001
。确保打印的值最初是 float
。 (FLT_MIN/2.0
是一个 double
。)
请注意,当数字小于 FLT_MIN
时,精度会丢失:最小 标准化 正浮点数。
同时考虑FLT_TRUE_MIN
:最小正浮点数。参见 binary32
#include <float.h>
#include <math.h>
#include <stdio.h>
int main(void) {
char *format = "%.10e %a\n";
printf(format, FLT_MIN, FLT_MIN);
printf(format, FLT_TRUE_MIN, FLT_TRUE_MIN);
float f = nextafterf(1.0f, 2.0f);
do {
f /= 2;
printf(format, f, f); // print in decimal and hex for detail
} while (f);
return 0;
}
输出
1.1754943508e-38 0x1p-126
1.4012984643e-45 0x1p-149
5.0000005960e-01 0x1.000002p-1
2.5000002980e-01 0x1.000002p-2
1.2500001490e-01 0x1.000002p-3
...
2.3509889819e-38 0x1.000002p-125
1.1754944910e-38 0x1.000002p-126
5.8774717541e-39 0x1p-127 // lost least significant bit of precision
2.9387358771e-39 0x1p-128
...
2.8025969286e-45 0x1p-148
1.4012984643e-45 0x1p-149
0.0000000000e+00 0x0p+0