float、double 和 long double 是否有保证的最小精度？

Do floats, doubles, and long doubles have a guaranteed minimum precision?

从我之前的问题“" I received a 说，

C provides DBL_DIG, DBL_DECIMAL_DIG, and their float and long double counterparts. DBL_DIG indicates the minimum relative decimal precision. DBL_DECIMAL_DIG can be thought of as the maximum relative decimal precision.

我查看了这些宏。它们位于 header <cfloat> 中。他们从 cplusplus reference page 中列出了 float、double 和 long double 的宏。

这是最小精度值的宏。

FLT_DIG 6 or greater

DBL_DIG 10 or greater

LDBL_DIG 10 or greater

如果我从表面上看这些宏，我会假设 float 的最小小数精度为 6，而 double 和 long double 的最小小数精度的 10。但是，作为一个大男孩，我知道有些事情可能好得令人难以置信。

所以，我想知道。 floats、double、long double是否保证了最小小数精度，这个最小小数精度就是上面给出的宏的值吗？

如果不是，为什么？

注：假设我们使用的编程语言是C++。

Do floats, doubles, and long doubles have guaranteed minimum decimal precision, and is this minimum decimal precision the values of the macros given above?

我在标准中找不到任何地方可以保证小数精度的任何最小值。

以下来自 http://en.cppreference.com/w/cpp/types/numeric_limits/digits10 的引述可能有用：

Example

An 8-bit binary type can represent any two-digit decimal number exactly, but 3-digit decimal numbers 256..999 cannot be represented. The value of digits10 for an 8-bit type is 2 (8 * std::log10(2) is 2.41)

The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (24 * std::log10(2) is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is 8.589973e9, which becomes 8.589974e9 after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as (24-1)*std::log10(2), which is 6.92. Rounding down results in the value 6.

但是，C 标准指定了需要支持的最小值。来自 C 标准：

5.2.4.2.2 Characteristics of floating types

...

9 The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign

...

-- number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits,

...

FLT_DIG 6
DBL_DIG 10
LDBL_DIG 10

如果std::numeric_limits<F>::is_iec559 is true, then the guarantees of the IEEE 754 standard适用于浮点型F.

否则（无论如何），C 标准规定了符号的最小允许值，例如 DBL_DIG，对于库而言，无可争议的是，“通过引用将其纳入 [C++] 国际标准” ，引自 C++11 §17.5.1.5/1.

编辑：正如 TC 在此处的评论中指出的那样，

” <climits> and <cfloat> are normatively incorporated by §18.3.3 [c.limits]; the minimum values are specified in turn in §5.2.4.2.2 of the C standard

不幸的是，对于正式的观点，首先 C++11 的引述来自第 17.5 节，它只是 信息性的 ，而不是规范。其次，C 标准中规定的最小值的措辞也在信息性而非规范性的部分（C99 标准的附录 E）中。因此，虽然它可以被视为实践保证，但它不是正式保证。

~~一个强烈的迹象表明 float 的实际最小精度是 6 位十进制数字，没有实现会给出更少的数字：~~

输出操作默认精度为 6，这是规范文本。

~~免责声明：可能有额外的措辞提供我没有注意到的保证。不太可能，但有可能。~~

C++ 标准没有具体说明浮点类型的限制。您可以根据需要解释 C 标准 "by reference" 的合并，但如果您采用此处指定的限制 (N1570)，第 5.2.4.2.2 节第 15 节：

EXAMPLE 1 The following describes an artificial floating-point representation that meets the minimum requirements of this International Standard, and the appropriate values in a header for type float:
FLT_RADIX 16
FLT_MANT_DIG 6
FLT_EPSILON 9.53674316E-07F
FLT_DECIMAL_DIG 9
FLT_DIG 6
FLT_MIN_EXP -31
FLT_MIN 2.93873588E-39F
FLT_MIN_10_EXP -38
FLT_MAX_EXP +32
FLT_MAX 3.40282347E+38F
FLT_MAX_10_EXP +38

到这一部分，float、double和long double至少具有这些属性*。

更具体一点。由于我的编译器使用 IEEE 754 标准，因此我的小数位的精度保证为 float 的 6 到 9 位有效小数位和 float 的 15 到 17 位有效小数位double。此外，由于我的编译器上的 long double 与 double 的大小相同，它也有 15 到 17 个有效的小数位。

这些范围可以分别从IEEE 754 single-precision binary floating-point format: binary32 and IEEE 754 double-precision binary floating-point format: binary64验证。

float、double 和 long double 是否有保证的最小精度？

Do floats, doubles, and long doubles have a guaranteed minimum precision?

c++

floating-point

minimum

language-lawyer

floating-point-precision