ULP(单位在最后一位)和量子(IEEE 754)之间的区别

Difference between ULP (unit in the last place) and quantum (IEEE 754)

来自ULP Wikipedia's page

Another definition, suggested by John Harrison, is slightly different: ULP(x) is the distance between the two closest straddling floating-point numbers a and b (i.e., those with a ≤ x ≤ b and a ≠ b), assuming that the exponent range is not upper-bounded.

来自 IEEE 754 2008:

2.1.44 quantum: The quantum of a finite floating-point representation is the value of a unit in the last position of its significand. This is equal to the radix raised to the exponent q, which is used when the significand is regarded as an integer.

问题:ULP(John Harrison 的定义)和 quantum(来自 IEEE 754)有什么区别?

我的理解正确吗 double x 的量子可以计算为:

double ulp(double x)
{
        int exp;
        frexp( x, &exp );
        return ldexp( 0.5, exp-52 );
}
double quantum(double x)
{
        int exp;
        return ulp(frexp( x, &exp ));
}

What is the difference between ULP (John Harrison's definition) and quantum (from IEEE 754)?

[编辑]

OP 的 quantum() 似乎不正确,始终为所有有限 x 返回 1.11022e-16,即使 x 是次正规的。

其余答案假定 quantum() 更像是下面的 quantum_alt(),每个 [2 的幂 ... 2*2 的幂) 的结果相同。注意 [).

基数的幂

x 是基数的幂时 These definitions differ only at signed powers of the radix

对于binary64,考虑x是2的幂。下一个较大的 FP 值是 x + u,下一个较小的值是 x - u/2.

John Harrison:“两个最接近的跨界浮点数 a 和 b 之间的距离(即,a ≤ x ≤ b 且 a ≠ b)”意味着 a 是较小的值并且 x == b 并且 ULP 是 u/2.1

Quantum:“表示是一个单位在其有效数最后位置的值”意味着 ULPu

距离b-a是“量子”定义的1/2; a 处于比 x 更小的指数子范围内,其最后一个有效位置是 x 的一半。


适用性

定义也不同,因为两者都适用于浮点值,但不适用于 quantumreal values like 1/7, √2, π.


在 select 个案例中,两个 OP 函数都是错误的。

ulp() 根据 John Harrison 的说法,当 x 是 2 的幂、零或次正规时是错误的。
备用

#include <math.h>

// Using the convention ULP(x) == ULP(-x)
// Adjust if you want a signed result.
double ulp_JH(double x) {
  x = fabs(x);
  if (isfinite(x)) {
    double lower = nextafter(x, -1.0); // 1st FP number smaller than x   
    return x - lower;
  }
  return x; // NAN, infinity
}

x 为零或次正规时,OP 的 quantum() 出错。

double quantum_alt(double x) {
  x = fabs(x);
  if (x < DBL_MAX) {
    double higher = nextafter(x, DBL_MAX); // 1st FP number larger than x   
    return higher - x;
  }
  if (isfinite(x)) {
    double lower = nextafter(x, 0.0); // Special case for DBL_MAX
    return x - lower;
  }
  return x; // NAN, infinity
}

1 除了 x == DBL_TRUE_MIN。在这种情况下。 ULP(DBL_TRUE_MIN)DBL_TRUE_MIN.