C++中如何定义从整数到浮点数的精度损失?
How is the precision loss from integer to float defined in C++?
我对下面的代码片段有疑问:
long l=9223372036854775807L;
float f=static_cast<float>(l);
long值不能完全按照IEEE754来表示。
我的问题是有损转换是如何处理的:
- 是否取最近的浮点数表示?
- 下一个smaller/bigger表示是否被占用?
- 或者是否采用了其他方法?
我知道这个问题
what happens at background when convert int to float 但这并没有回答我的问题。
参见here:
A prvalue of integer or unscoped enumeration type can be converted to
a prvalue of any floating-point type. If the value cannot be
represented correctly, it is implementation defined whether the
closest higher or the closest lower representable value will be
selected, although if IEEE arithmetic is supported, rounding defaults
to nearest. If the value cannot fit into the destination type, the
behavior is undefined. If the source type is bool, the value false is
converted to zero, and the value true is converted to one.
关于IEEE 754的舍入规则,好像是five of them. I couldn't find any information on which ones are used in which situation, though. It looks like it's up to the implementation however, you can set the rounding mode in a C++ program as described here。
C++ 像这样定义转换(引用最新的标准草案):
[conv.fpint]
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type.
The result is exact if possible.
If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value.
[ Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type.
— end note
]
If the value being converted is outside the range of values that can be represented, the behavior is undefined.
If the source type is bool, the value false is converted to zero and the value true is converted to one.
IEEE 754 标准这样定义转换:
5.4.1 Arithmetic operations
It shall be possible to convert from all supported signed and unsigned integer formats to all supported arithmetic formats. Integral values are converted exactly from integer formats to floating-point formats whenever the value is representable in both formats. If the converted value is not exactly representable in the destination format, the result is determined according to the applicable rounding-direction attribute, and an inexact or floating-point overflow exception arises as specified in Clause 7, just as with arithmetic operations. The signs of integer zeros are preserved. Integer zeros without signs are converted to +0. The preferred exponent is 0.
舍入模式指定为:
4.3.1 Rounding-direction attributes to nearest
roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered.
roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered.
4.3.2 Directed rounding attributes
roundTowardPositive, the result shall be the format’s floating-point number (possibly +∞) closest to and no less than the infinitely precise result
roundTowardNegative, the result shall be the format’s floating-point number (possibly −∞) closest to and no greater than the infinitely precise result
roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result.
4.3.3 Rounding attribute requirements
The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats.
因此默认情况下,您的建议 1 将适用,但前提是未选择其他模式。
C++ 标准库从 C 标准继承 <cfenv>
。此 header 提供用于与浮点环境交互的宏、函数和类型,包括舍入模式。
我对下面的代码片段有疑问:
long l=9223372036854775807L;
float f=static_cast<float>(l);
long值不能完全按照IEEE754来表示。
我的问题是有损转换是如何处理的:
- 是否取最近的浮点数表示?
- 下一个smaller/bigger表示是否被占用?
- 或者是否采用了其他方法?
我知道这个问题 what happens at background when convert int to float 但这并没有回答我的问题。
参见here:
A prvalue of integer or unscoped enumeration type can be converted to a prvalue of any floating-point type. If the value cannot be represented correctly, it is implementation defined whether the closest higher or the closest lower representable value will be selected, although if IEEE arithmetic is supported, rounding defaults to nearest. If the value cannot fit into the destination type, the behavior is undefined. If the source type is bool, the value false is converted to zero, and the value true is converted to one.
关于IEEE 754的舍入规则,好像是five of them. I couldn't find any information on which ones are used in which situation, though. It looks like it's up to the implementation however, you can set the rounding mode in a C++ program as described here。
C++ 像这样定义转换(引用最新的标准草案):
[conv.fpint]
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. The result is exact if possible. If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value. [ Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type. — end note ] If the value being converted is outside the range of values that can be represented, the behavior is undefined. If the source type is bool, the value false is converted to zero and the value true is converted to one.
IEEE 754 标准这样定义转换:
5.4.1 Arithmetic operations
It shall be possible to convert from all supported signed and unsigned integer formats to all supported arithmetic formats. Integral values are converted exactly from integer formats to floating-point formats whenever the value is representable in both formats. If the converted value is not exactly representable in the destination format, the result is determined according to the applicable rounding-direction attribute, and an inexact or floating-point overflow exception arises as specified in Clause 7, just as with arithmetic operations. The signs of integer zeros are preserved. Integer zeros without signs are converted to +0. The preferred exponent is 0.
舍入模式指定为:
4.3.1 Rounding-direction attributes to nearest
roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered.
roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered.
4.3.2 Directed rounding attributes
roundTowardPositive, the result shall be the format’s floating-point number (possibly +∞) closest to and no less than the infinitely precise result
roundTowardNegative, the result shall be the format’s floating-point number (possibly −∞) closest to and no greater than the infinitely precise result
roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result.
4.3.3 Rounding attribute requirements
The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats.
因此默认情况下,您的建议 1 将适用,但前提是未选择其他模式。
C++ 标准库从 C 标准继承 <cfenv>
。此 header 提供用于与浮点环境交互的宏、函数和类型,包括舍入模式。