对浮点舍入的 Go 语言规范的误解
Misunderstanding Go Language specification on floating-point rounding
关于 Constant expressions 部分的 Go 语言规范指出:
A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
是否句子
This rounding may cause a floating-point constant expression to be invalid in an integer context
指向如下内容:
func main() {
a := 853784574674.23846278367
fmt.Println(int8(a)) // output: 0
}
int8 是一个带符号的整数,其值介于 -128 到 127 之间。这就是为什么您在 int8(a) 转换中看到意外值的原因。
规范中引用的部分不适用于您的示例,因为 a
不是常量表达式而是变量,因此 int8(a)
正在转换 non-constant 表达式。此转换由 Spec: Conversions 涵盖,数字类型之间的转换:
When converting a floating-point number to an integer, the fraction is discarded (truncation towards zero).
[...] In all non-constant conversions involving floating-point or complex values, if the result type cannot represent the value the conversion succeeds but the result value is implementation-dependent.
由于您将 non-constant 表达式 a
853784574674.23846278367
转换为整数,小数部分将被丢弃,并且由于结果不适合 int8
,结果未指定,是 implementation-dependent.
引用的部分意味着虽然常量的表示精度比内置类型(例如 float64
或 int64
)高得多,但编译器(必须)实现的精度是不是无限的(出于实际原因),即使浮点文字可以精确表示,对它们执行操作也可能会进行中间舍入,并且可能无法给出数学上正确的结果。
The spec includes the minimum supportable precision:
Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:
- Represent integer constants with at least 256 bits.
- Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.
- Give an error if unable to represent an integer constant precisely.
- Give an error if unable to represent a floating-point or complex constant due to overflow.
- Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.
例如:
const (
x = 1e100000 + 1
y = 1e100000
)
func main() {
fmt.Println(x - y)
}
此代码应输出 1
,因为 x
比 y
大 1。 运行 它在 Go Playground 上输出 0
因为常量表达式 x - y
是四舍五入执行的,结果 +1
丢失了。 x
和 y
都是整数(没有小数部分),所以在整数上下文中结果应该是 1
。但是数字是 1e100000
,表示它需要大约 333000 位,这不是编译器的有效要求(根据规范,256 位尾数就足够了)。
如果我们降低常量,我们会得到正确的结果:
const (
x = 1e1000 + 1
y = 1e1000
)
func main() {
fmt.Println(x - y)
}
这将输出数学上正确的 1
结果。在 Go Playground 上试一试。表示数字 1e1000
需要大约 ~3333 位,这似乎是受支持的(并且远远高于最低 256 位要求)。
关于 Constant expressions 部分的 Go 语言规范指出:
A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
是否句子
This rounding may cause a floating-point constant expression to be invalid in an integer context
指向如下内容:
func main() {
a := 853784574674.23846278367
fmt.Println(int8(a)) // output: 0
}
int8 是一个带符号的整数,其值介于 -128 到 127 之间。这就是为什么您在 int8(a) 转换中看到意外值的原因。
规范中引用的部分不适用于您的示例,因为 a
不是常量表达式而是变量,因此 int8(a)
正在转换 non-constant 表达式。此转换由 Spec: Conversions 涵盖,数字类型之间的转换:
When converting a floating-point number to an integer, the fraction is discarded (truncation towards zero).
[...] In all non-constant conversions involving floating-point or complex values, if the result type cannot represent the value the conversion succeeds but the result value is implementation-dependent.
由于您将 non-constant 表达式 a
853784574674.23846278367
转换为整数,小数部分将被丢弃,并且由于结果不适合 int8
,结果未指定,是 implementation-dependent.
引用的部分意味着虽然常量的表示精度比内置类型(例如 float64
或 int64
)高得多,但编译器(必须)实现的精度是不是无限的(出于实际原因),即使浮点文字可以精确表示,对它们执行操作也可能会进行中间舍入,并且可能无法给出数学上正确的结果。
The spec includes the minimum supportable precision:
Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:
- Represent integer constants with at least 256 bits.
- Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.
- Give an error if unable to represent an integer constant precisely.
- Give an error if unable to represent a floating-point or complex constant due to overflow.
- Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.
例如:
const (
x = 1e100000 + 1
y = 1e100000
)
func main() {
fmt.Println(x - y)
}
此代码应输出 1
,因为 x
比 y
大 1。 运行 它在 Go Playground 上输出 0
因为常量表达式 x - y
是四舍五入执行的,结果 +1
丢失了。 x
和 y
都是整数(没有小数部分),所以在整数上下文中结果应该是 1
。但是数字是 1e100000
,表示它需要大约 333000 位,这不是编译器的有效要求(根据规范,256 位尾数就足够了)。
如果我们降低常量,我们会得到正确的结果:
const (
x = 1e1000 + 1
y = 1e1000
)
func main() {
fmt.Println(x - y)
}
这将输出数学上正确的 1
结果。在 Go Playground 上试一试。表示数字 1e1000
需要大约 ~3333 位,这似乎是受支持的(并且远远高于最低 256 位要求)。