为什么编译器将浮点数的位数固定为6位?
Why do compilers fix the digits of floating point number to 6?
根据 C++ 编程语言 - 第 4,第 6.2.5 节:
There are three floating-points types: float (single-precision), double (double-precision), and long double (extended-precision)
参考:http://en.wikipedia.org/wiki/Single-precision_floating-point_format
The true significand includes 23 fraction bits to the right of the binary point and an implicit leading bit (to the left of the binary point) with value 1 unless the exponent is stored with all zeros. Thus only 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits (equivalent to log10(224) ≈ 7.225 decimal digits).
→浮点数的最大位数是binary32 interchange format
上的7
位。 (一种在计算机内存中占4个字节(32位)的计算机数字格式)
当我在不同的编译器(如 GCC、VC 编译器)上进行测试时
→ 它总是输出 6
作为值。
查看每个编译器的float.h
→ 我发现6
是fixed。
问题:
- 你知道为什么这里有差异吗(实际值理论值- 7 - 和实际值- 6)?
听起来 "7" 更合理,因为当我使用下面的代码进行测试时,该值仍然有效,而 "8"无效
- 为什么编译器不检查交换格式来决定浮点数表示的位数(而是使用固定值)?
代码:
#include <iostream>
#include <limits>
using namespace std;
int main( )
{
cout << numeric_limits<float> :: digits10 << endl;
float f = -9999999;
cout.precision ( 10 );
cout << f << endl;
}
您没有阅读文档。
std::numeric_limits<float>::digits10
是 6:
The value of std::numeric_limits<T>::digits10
is the number of base-10
digits that can be represented by the type T
without change, that is, any number with this many decimal digits can be converted to a value of type T
and back to decimal form, without change due to rounding or overflow. For base-radix types, it is the value of digits (digits-1
for floating-point types) multiplied by log10(radix) and rounded down.
The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (24 * std::log10(2)
is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is 8.589973e9
, which becomes 8.589974e9
after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as (24-1)*std::log10(2)
, which is 6.92. Rounding down results in the value 6.
std::numeric_limits<float>::max_digits10
是 9:
The value of std::numeric_limits<T>::max_digits10
is the number of base-10
digits that are necessary to uniquely represent all distinct values of the type T
, such as necessary for serialization/deserialization to text. This constant is meaningful for all floating-point types.
Unlike most mathematical operations, the conversion of a floating-point value to text and back is exact as long as at least max_digits10
were used (9
for float
, 17
for double
): it is guaranteed to produce the same floating-point value, even though the intermediate text representation is not exact. It may take over a hundred decimal digits to represent the precise value of a float in decimal notation.
std::numeric_limits<float>::digits10
equates to FLT_DIG
,这是C标准定义的:
number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits,
⎧ p log10 b if b is a power of 10
⎨
⎩ ⎣( p − 1) log10 b⎦ otherwise
FLT_DIG 6
DBL_DIG 10
LDBL_DIG 10
值 6(而不是 7)的原因是舍入误差 - 并非所有具有 7 位十进制数字的浮点值都可以用 32 位 float
无损地表示。虽然舍入误差限制为 1 位,因此 FLT_DIG
值是根据 23 位(而不是完整的 24 位)计算的:
23 * log10(2) = 6.92
四舍五入为 6
。
根据 C++ 编程语言 - 第 4,第 6.2.5 节:
There are three floating-points types: float (single-precision), double (double-precision), and long double (extended-precision)
参考:http://en.wikipedia.org/wiki/Single-precision_floating-point_format
The true significand includes 23 fraction bits to the right of the binary point and an implicit leading bit (to the left of the binary point) with value 1 unless the exponent is stored with all zeros. Thus only 23 fraction bits of the significand appear in the memory format but the total precision is 24 bits (equivalent to log10(224) ≈ 7.225 decimal digits).
→浮点数的最大位数是binary32 interchange format
上的7
位。 (一种在计算机内存中占4个字节(32位)的计算机数字格式)
当我在不同的编译器(如 GCC、VC 编译器)上进行测试时
→ 它总是输出 6
作为值。
查看每个编译器的float.h
→ 我发现6
是fixed。
问题:
- 你知道为什么这里有差异吗(实际值理论值- 7 - 和实际值- 6)?
听起来 "7" 更合理,因为当我使用下面的代码进行测试时,该值仍然有效,而 "8"无效 - 为什么编译器不检查交换格式来决定浮点数表示的位数(而是使用固定值)?
代码:
#include <iostream>
#include <limits>
using namespace std;
int main( )
{
cout << numeric_limits<float> :: digits10 << endl;
float f = -9999999;
cout.precision ( 10 );
cout << f << endl;
}
您没有阅读文档。
std::numeric_limits<float>::digits10
是 6:
The value of
std::numeric_limits<T>::digits10
is the number of base-10
digits that can be represented by the typeT
without change, that is, any number with this many decimal digits can be converted to a value of typeT
and back to decimal form, without change due to rounding or overflow. For base-radix types, it is the value of digits (digits-1
for floating-point types) multiplied by log10(radix) and rounded down.The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (
24 * std::log10(2)
is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is8.589973e9
, which becomes8.589974e9
after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as(24-1)*std::log10(2)
, which is 6.92. Rounding down results in the value 6.
std::numeric_limits<float>::max_digits10
是 9:
The value of
std::numeric_limits<T>::max_digits10
is the number of base-10
digits that are necessary to uniquely represent all distinct values of the typeT
, such as necessary for serialization/deserialization to text. This constant is meaningful for all floating-point types.Unlike most mathematical operations, the conversion of a floating-point value to text and back is exact as long as at least
max_digits10
were used (9
forfloat
,17
fordouble
): it is guaranteed to produce the same floating-point value, even though the intermediate text representation is not exact. It may take over a hundred decimal digits to represent the precise value of a float in decimal notation.
std::numeric_limits<float>::digits10
equates to FLT_DIG
,这是C标准定义的:
number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits,
⎧ p log10 b if b is a power of 10
⎨
⎩ ⎣( p − 1) log10 b⎦ otherwise
FLT_DIG 6
DBL_DIG 10
LDBL_DIG 10
值 6(而不是 7)的原因是舍入误差 - 并非所有具有 7 位十进制数字的浮点值都可以用 32 位 float
无损地表示。虽然舍入误差限制为 1 位,因此 FLT_DIG
值是根据 23 位(而不是完整的 24 位)计算的:
23 * log10(2) = 6.92
四舍五入为 6
。