给定两个双打，如何确定它们的商是否准确？

Question

给定两个 double 值，p 和 q，我如何确定它们的商：

double result = p / q;

是精确结果 p 和 q?

的二进制值

即result是否正好等于p和q的数学除法。

显然，对于某些值（例如 1.0 / 2.0）这是正确的，而对于其他值（例如 1.0 / 5.0）是错误的，因此我正在寻找一种惯用且准确的方法来区分大小写。

似乎浮点模数 p % q == 0 可能有效，但我不确定！

Answer 1

您可以使用 BigDecimal 查看除法是否准确：

private static boolean canDivideExact(double p, double q) {
  double r = p / q;
  BigDecimal d = new BigDecimal(r);
  return d.multiply(new BigDecimal(q)).compareTo(new BigDecimal(p)) == 0;
}

例如：

System.out.println(canDivideExact(1, 2)); //true
System.out.println(canDivideExact(1, 3)); //false

Answer 2

我认为您的 p % q == 0 解决方案行不通：它检查 p 是否可以 均匀地 除以 q，即, p / q 是否为整数。例如，1.0 % 2.0 == 1.0，虽然它可以精确表示为双精度数：1.0 / 2.0 == 0.5.

有趣的是，IEEE 754 也是 Java 浮点实现的基础，它正是您所需要的。如果浮点运算产生不精确的结果，则会引发异常（在 IEEE 意义上），默认情况下会更新状态字，因此您始终可以通过检查此状态字来检查结果是否精确。不幸的是，Java 选择不让该状态可访问。

如果你想继续使用 Java，你要么必须使用 assylia 的基于 BigDecimal 的解决方案，要么尝试通过 JNI 访问这些错误标志：在 C 中（C99 以后）你可以使用 fetestexcept(FE_INEXACT) 测试结果是否准确。不过，我不知道这是否可行。

Answer 3

这里有两种不涉及使用BigDecimal的方法。如所写，两者都不适用于次正规，但如果您可以避免次正规、下溢和上溢，则两者都应该会产生良好的结果。我希望这两种方法都可以适用于次正规情况，但我还没有想过如何去做。

对于整数 e 和 m，任何有限非零 double x 都可以唯一地写成 x = m 2^e 形式 m 奇数。我们将 m 称为 x 的 奇数部分 。现在给定两个非零有限双精度 x 和 y，并假设避免了上溢和下溢，当且仅当 x 的奇数部分是一个y 的奇数部分的整数倍。我们可以使用 % 检查整数倍条件，所以剩下的就是找到一种计算奇数部分的方法。在 C 或 Python 中，我会使用 frexp，舍弃指数，并重复将分数乘以 2 直到它是一个整数，但 frexp 似乎不可用Java。但是Java确实有Math.getExponent，可以提供frexp的指数部分，然后Math.scalb可以得到小数
计算 x / y 并得到（可能四舍五入的）结果 z 后，您可以使用双双运算将 y 乘以 z （通过 Veltkamp 拆分和 Dekker 乘法），并检查结果是否完全等于 x。这应该比使用 BigDecimal 的等效方法更有效，因为我们事先知道我们不需要超过通常浮点精度的两倍来包含结果。

恐怕我Java不够流利，无法给出代码，但这里有Python中的代码，适应Java应该很简单。（请注意，Python 的 float 类型在典型机器上匹配 Java 的 double；理论上，Python 不需要 IEEE 754，但在实践中，Python float 格式几乎不可避免地会是 IEEE 754 binary64。）

如果有人想窃取此代码，将其转换为Java，并将其放入答案中，我会很乐意投票。

import math

def odd_part(x):
    """
    Return an odd integer m (as a double) such that x can be written
    in the form m * 2**e for some exponent e. The exponent e is not
    returned.
    """
    fraction, exponent = math.frexp(x)

    # here fraction * 2**53 is guaranteed to be an integer, so we
    # don't need to loop more than 53 times.
    while fraction % 1.0 != 0.0:  # or in Python, use the is_integer method.
        fraction *= 2.0
    return fraction


# Constant used in Veltkamp splitting.
C = float.fromhex('0x1.0000002000000p+27')

def split(x):
    """
    Split a double x into pieces x_hi, x_lo, each
    expressible with 26 bits of precision.

    Algorithm due to Veltkamp.

    Parameters
    ----------
    x : float
        Finite float, such that C*x does not overflow. Assumes IEEE 754
        representation and arithmetic, with round-ties-to-even rounding
        mode.

    Returns
    -------
    l, h : float
        l and h are both representable in 26 bits of precision, and
        x = l + h.

    """
    # Idea of proof: without loss of generality, we can reduce to the case
    # where 0.5 < x < 1 (the case where x is a power of 2 is straightforward).
    # Write rnd for the floating-point rounding operation, so p = rnd(Cx) and q
    # = rnd(x-p).
    #
    # Now let e and f be the errors induced by the floating-point operations,
    # so
    #     p = Cx + e
    #     q = x - p + f
    #
    # Then it's easy to show that:
    #
    #  2**26 < |Cx| < 2**28, so p is a multiple of 2**-26 and |e| <= 2**-26.
    #  2**26 <= p - x <= 2**27, so q is a multiple of 2**-26 and |f| <= 2**-27.
    #  h = p + q is exactly representable, equal to x + f
    #  h <= 1, and h is a multiple of 2**-26, so h has precision <= 26.
    #  l = x - h is exactly representable, equal to f.
    #  |f| <= 2**-27, and f is a multiple of 2**-53, so f has precision <= 26.

    p = C * x
    q = x - p
    h = p + q
    l = x - h
    return l, h


def exact_mult(x, y):
    """
    Multiply floats x and y exactly, expressing the result as l + h,
    where h is the closest float to x * y and l is the error.

    Algorithm is due to Dekker.

    Assumes that x and y are finite IEEE 754 binary64 floats.

    May return inf or nan due to intermediate overflow.

    May raise ValueError on underflow or near-underflow.

    If both results are finite, then we have equality:

       x * y = h + l

    """
    # Write |x| = M * 2**e, y = |N| * 2**f, for some M and N with
    # M, N <= 2**53 - 1. Then xy = M*N*2**(e+f). If e + f <= -1075
    # then xy < (2**53 - 1)**2 * 2**-1075 < 2**-969 (1 - 2**-53),
    # which is exactly representable.
    # Hence the rounded value of |xy| is also < 2**-969.

    # So if |xy| >= 2**-969, and |xy| isn't near overflow, it follows that x*y
    # *can* be expressed as the sum of two doubles: 

    # If |xy| < 2**-969, we can't guarantee it, and we raise ValueError.

    h = x * y

    if abs(h) < 2**-969 and x != 0.0 and y != 0.0:
        raise ValueError("Cannot guarantee exact result.")

    xl, xh = split(x)
    yl, yh = split(y)
    return -h + xh * yh + xh * yl + xl * yh + xl * yl, h


def division_exact_method_1(x, y):
    """
    Given nonzero finite not-too-large not-too-small floats x and y,
    return True if x / y is exactly representable, else False.
    """
    return odd_part(x) % odd_part(y) == 0


def division_exact_method_2(x, y):
    """
    Given nonzero finite not-too-large not-too-small floats x and y,
    return True if x / y is exactly representable, else False.
    """
    z = x / y
    low, high = exact_mult(y, z)
    return high == x and low == 0

给定两个双打，如何确定它们的商是否准确？

Given two doubles, how to determine if their quotient is exact?

java

math

floating-point

ieee-754