给定两个双打,如何确定它们的商是否准确?
Given two doubles, how to determine if their quotient is exact?
给定两个 double
值,p
和 q
,我如何确定它们的商:
double result = p / q;
是 精确 结果 p
和 q
?
的二进制值
即result
是否正好等于p
和q
的数学除法。
显然,对于某些值(例如 1.0 / 2.0
)这是正确的,而对于其他值(例如 1.0 / 5.0
)是错误的,因此我正在寻找一种惯用且准确的方法来区分大小写。
似乎浮点模数 p % q == 0
可能 有效,但我不确定!
您可以使用 BigDecimal
查看除法是否准确:
private static boolean canDivideExact(double p, double q) {
double r = p / q;
BigDecimal d = new BigDecimal(r);
return d.multiply(new BigDecimal(q)).compareTo(new BigDecimal(p)) == 0;
}
例如:
System.out.println(canDivideExact(1, 2)); //true
System.out.println(canDivideExact(1, 3)); //false
我认为您的 p % q == 0
解决方案行不通:它检查 p
是否可以 均匀地 除以 q
,即, p / q
是否为整数。例如,1.0 % 2.0 == 1.0
,虽然它可以精确表示为双精度数:1.0 / 2.0 == 0.5
.
有趣的是,IEEE 754 也是 Java 浮点实现的基础,它正是您所需要的。如果浮点运算产生不精确的结果,则会引发异常(在 IEEE 意义上),默认情况下会更新状态字,因此您始终可以通过检查此状态字来检查结果是否精确。不幸的是,Java 选择不让该状态可访问。
如果你想继续使用 Java,你要么必须使用 assylia 的基于 BigDecimal
的解决方案,要么尝试通过 JNI 访问这些错误标志:在 C 中(C99 以后)你可以使用 fetestexcept(FE_INEXACT)
测试结果是否准确。不过,我不知道这是否可行。
这里有两种不涉及使用BigDecimal
的方法。如所写,两者都不适用于次正规,但如果您可以避免次正规、下溢和上溢,则两者都应该会产生良好的结果。我希望这两种方法都可以适用于次正规情况,但我还没有想过如何去做。
对于整数 e
和 m
,任何有限非零 double
x
都可以唯一地写成 x = m 2^e
形式 m
奇数。我们将 m
称为 x
的 奇数部分 。现在给定两个非零有限双精度 x
和 y
,并假设避免了上溢和下溢,当且仅当 x
的奇数部分是一个y
的奇数部分的整数倍。我们可以使用 %
检查整数倍条件,所以剩下的就是找到一种计算奇数部分的方法。在 C 或 Python 中,我会使用 frexp
,舍弃指数,并重复将分数乘以 2 直到它是一个整数,但 frexp
似乎不可用Java。但是Java确实有Math.getExponent
,可以提供frexp
的指数部分,然后Math.scalb
可以得到小数
计算 x / y
并得到(可能四舍五入的)结果 z
后,您可以使用双双运算将 y
乘以 z
(通过 Veltkamp 拆分和 Dekker 乘法),并检查结果是否完全等于 x
。这应该比使用 BigDecimal
的等效方法更有效,因为我们事先知道我们不需要超过通常浮点精度的两倍来包含结果。
恐怕我Java不够流利,无法给出代码,但这里有Python中的代码,适应Java应该很简单。 (请注意,Python 的 float
类型在典型机器上匹配 Java 的 double
;理论上,Python 不需要 IEEE 754,但在实践中,Python float
格式几乎不可避免地会是 IEEE 754 binary64。)
如果有人想窃取此代码,将其转换为Java,并将其放入答案中,我会很乐意投票。
import math
def odd_part(x):
"""
Return an odd integer m (as a double) such that x can be written
in the form m * 2**e for some exponent e. The exponent e is not
returned.
"""
fraction, exponent = math.frexp(x)
# here fraction * 2**53 is guaranteed to be an integer, so we
# don't need to loop more than 53 times.
while fraction % 1.0 != 0.0: # or in Python, use the is_integer method.
fraction *= 2.0
return fraction
# Constant used in Veltkamp splitting.
C = float.fromhex('0x1.0000002000000p+27')
def split(x):
"""
Split a double x into pieces x_hi, x_lo, each
expressible with 26 bits of precision.
Algorithm due to Veltkamp.
Parameters
----------
x : float
Finite float, such that C*x does not overflow. Assumes IEEE 754
representation and arithmetic, with round-ties-to-even rounding
mode.
Returns
-------
l, h : float
l and h are both representable in 26 bits of precision, and
x = l + h.
"""
# Idea of proof: without loss of generality, we can reduce to the case
# where 0.5 < x < 1 (the case where x is a power of 2 is straightforward).
# Write rnd for the floating-point rounding operation, so p = rnd(Cx) and q
# = rnd(x-p).
#
# Now let e and f be the errors induced by the floating-point operations,
# so
# p = Cx + e
# q = x - p + f
#
# Then it's easy to show that:
#
# 2**26 < |Cx| < 2**28, so p is a multiple of 2**-26 and |e| <= 2**-26.
# 2**26 <= p - x <= 2**27, so q is a multiple of 2**-26 and |f| <= 2**-27.
# h = p + q is exactly representable, equal to x + f
# h <= 1, and h is a multiple of 2**-26, so h has precision <= 26.
# l = x - h is exactly representable, equal to f.
# |f| <= 2**-27, and f is a multiple of 2**-53, so f has precision <= 26.
p = C * x
q = x - p
h = p + q
l = x - h
return l, h
def exact_mult(x, y):
"""
Multiply floats x and y exactly, expressing the result as l + h,
where h is the closest float to x * y and l is the error.
Algorithm is due to Dekker.
Assumes that x and y are finite IEEE 754 binary64 floats.
May return inf or nan due to intermediate overflow.
May raise ValueError on underflow or near-underflow.
If both results are finite, then we have equality:
x * y = h + l
"""
# Write |x| = M * 2**e, y = |N| * 2**f, for some M and N with
# M, N <= 2**53 - 1. Then xy = M*N*2**(e+f). If e + f <= -1075
# then xy < (2**53 - 1)**2 * 2**-1075 < 2**-969 (1 - 2**-53),
# which is exactly representable.
# Hence the rounded value of |xy| is also < 2**-969.
# So if |xy| >= 2**-969, and |xy| isn't near overflow, it follows that x*y
# *can* be expressed as the sum of two doubles:
# If |xy| < 2**-969, we can't guarantee it, and we raise ValueError.
h = x * y
if abs(h) < 2**-969 and x != 0.0 and y != 0.0:
raise ValueError("Cannot guarantee exact result.")
xl, xh = split(x)
yl, yh = split(y)
return -h + xh * yh + xh * yl + xl * yh + xl * yl, h
def division_exact_method_1(x, y):
"""
Given nonzero finite not-too-large not-too-small floats x and y,
return True if x / y is exactly representable, else False.
"""
return odd_part(x) % odd_part(y) == 0
def division_exact_method_2(x, y):
"""
Given nonzero finite not-too-large not-too-small floats x and y,
return True if x / y is exactly representable, else False.
"""
z = x / y
low, high = exact_mult(y, z)
return high == x and low == 0
给定两个 double
值,p
和 q
,我如何确定它们的商:
double result = p / q;
是 精确 结果 p
和 q
?
即result
是否正好等于p
和q
的数学除法。
显然,对于某些值(例如 1.0 / 2.0
)这是正确的,而对于其他值(例如 1.0 / 5.0
)是错误的,因此我正在寻找一种惯用且准确的方法来区分大小写。
似乎浮点模数 p % q == 0
可能 有效,但我不确定!
您可以使用 BigDecimal
查看除法是否准确:
private static boolean canDivideExact(double p, double q) {
double r = p / q;
BigDecimal d = new BigDecimal(r);
return d.multiply(new BigDecimal(q)).compareTo(new BigDecimal(p)) == 0;
}
例如:
System.out.println(canDivideExact(1, 2)); //true
System.out.println(canDivideExact(1, 3)); //false
我认为您的 p % q == 0
解决方案行不通:它检查 p
是否可以 均匀地 除以 q
,即, p / q
是否为整数。例如,1.0 % 2.0 == 1.0
,虽然它可以精确表示为双精度数:1.0 / 2.0 == 0.5
.
有趣的是,IEEE 754 也是 Java 浮点实现的基础,它正是您所需要的。如果浮点运算产生不精确的结果,则会引发异常(在 IEEE 意义上),默认情况下会更新状态字,因此您始终可以通过检查此状态字来检查结果是否精确。不幸的是,Java 选择不让该状态可访问。
如果你想继续使用 Java,你要么必须使用 assylia 的基于 BigDecimal
的解决方案,要么尝试通过 JNI 访问这些错误标志:在 C 中(C99 以后)你可以使用 fetestexcept(FE_INEXACT)
测试结果是否准确。不过,我不知道这是否可行。
这里有两种不涉及使用BigDecimal
的方法。如所写,两者都不适用于次正规,但如果您可以避免次正规、下溢和上溢,则两者都应该会产生良好的结果。我希望这两种方法都可以适用于次正规情况,但我还没有想过如何去做。
对于整数
e
和m
,任何有限非零double
x
都可以唯一地写成x = m 2^e
形式m
奇数。我们将m
称为x
的 奇数部分 。现在给定两个非零有限双精度x
和y
,并假设避免了上溢和下溢,当且仅当x
的奇数部分是一个y
的奇数部分的整数倍。我们可以使用%
检查整数倍条件,所以剩下的就是找到一种计算奇数部分的方法。在 C 或 Python 中,我会使用frexp
,舍弃指数,并重复将分数乘以 2 直到它是一个整数,但frexp
似乎不可用Java。但是Java确实有Math.getExponent
,可以提供frexp
的指数部分,然后Math.scalb
可以得到小数计算
x / y
并得到(可能四舍五入的)结果z
后,您可以使用双双运算将y
乘以z
(通过 Veltkamp 拆分和 Dekker 乘法),并检查结果是否完全等于x
。这应该比使用BigDecimal
的等效方法更有效,因为我们事先知道我们不需要超过通常浮点精度的两倍来包含结果。
恐怕我Java不够流利,无法给出代码,但这里有Python中的代码,适应Java应该很简单。 (请注意,Python 的 float
类型在典型机器上匹配 Java 的 double
;理论上,Python 不需要 IEEE 754,但在实践中,Python float
格式几乎不可避免地会是 IEEE 754 binary64。)
如果有人想窃取此代码,将其转换为Java,并将其放入答案中,我会很乐意投票。
import math
def odd_part(x):
"""
Return an odd integer m (as a double) such that x can be written
in the form m * 2**e for some exponent e. The exponent e is not
returned.
"""
fraction, exponent = math.frexp(x)
# here fraction * 2**53 is guaranteed to be an integer, so we
# don't need to loop more than 53 times.
while fraction % 1.0 != 0.0: # or in Python, use the is_integer method.
fraction *= 2.0
return fraction
# Constant used in Veltkamp splitting.
C = float.fromhex('0x1.0000002000000p+27')
def split(x):
"""
Split a double x into pieces x_hi, x_lo, each
expressible with 26 bits of precision.
Algorithm due to Veltkamp.
Parameters
----------
x : float
Finite float, such that C*x does not overflow. Assumes IEEE 754
representation and arithmetic, with round-ties-to-even rounding
mode.
Returns
-------
l, h : float
l and h are both representable in 26 bits of precision, and
x = l + h.
"""
# Idea of proof: without loss of generality, we can reduce to the case
# where 0.5 < x < 1 (the case where x is a power of 2 is straightforward).
# Write rnd for the floating-point rounding operation, so p = rnd(Cx) and q
# = rnd(x-p).
#
# Now let e and f be the errors induced by the floating-point operations,
# so
# p = Cx + e
# q = x - p + f
#
# Then it's easy to show that:
#
# 2**26 < |Cx| < 2**28, so p is a multiple of 2**-26 and |e| <= 2**-26.
# 2**26 <= p - x <= 2**27, so q is a multiple of 2**-26 and |f| <= 2**-27.
# h = p + q is exactly representable, equal to x + f
# h <= 1, and h is a multiple of 2**-26, so h has precision <= 26.
# l = x - h is exactly representable, equal to f.
# |f| <= 2**-27, and f is a multiple of 2**-53, so f has precision <= 26.
p = C * x
q = x - p
h = p + q
l = x - h
return l, h
def exact_mult(x, y):
"""
Multiply floats x and y exactly, expressing the result as l + h,
where h is the closest float to x * y and l is the error.
Algorithm is due to Dekker.
Assumes that x and y are finite IEEE 754 binary64 floats.
May return inf or nan due to intermediate overflow.
May raise ValueError on underflow or near-underflow.
If both results are finite, then we have equality:
x * y = h + l
"""
# Write |x| = M * 2**e, y = |N| * 2**f, for some M and N with
# M, N <= 2**53 - 1. Then xy = M*N*2**(e+f). If e + f <= -1075
# then xy < (2**53 - 1)**2 * 2**-1075 < 2**-969 (1 - 2**-53),
# which is exactly representable.
# Hence the rounded value of |xy| is also < 2**-969.
# So if |xy| >= 2**-969, and |xy| isn't near overflow, it follows that x*y
# *can* be expressed as the sum of two doubles:
# If |xy| < 2**-969, we can't guarantee it, and we raise ValueError.
h = x * y
if abs(h) < 2**-969 and x != 0.0 and y != 0.0:
raise ValueError("Cannot guarantee exact result.")
xl, xh = split(x)
yl, yh = split(y)
return -h + xh * yh + xh * yl + xl * yh + xl * yl, h
def division_exact_method_1(x, y):
"""
Given nonzero finite not-too-large not-too-small floats x and y,
return True if x / y is exactly representable, else False.
"""
return odd_part(x) % odd_part(y) == 0
def division_exact_method_2(x, y):
"""
Given nonzero finite not-too-large not-too-small floats x and y,
return True if x / y is exactly representable, else False.
"""
z = x / y
low, high = exact_mult(y, z)
return high == x and low == 0