比较浮点加法的存储结果与 Fortran 中的临时结果

Question

我注意到 ==-运算符对浮点类型的某些行为对我来说似乎很奇怪。我知道由于浮点表示的限制，我不能期望像 0.1 + 0.2 == 0.3 这样的东西是 .true.，因此，浮点比较通常应该用像 abs(x - y) < tolerance 这样的东西来完成。但是，我仍然希望这个最小程序在任何情况下都能输出 T：

program main
    integer, parameter :: dp = kind(0d0)
    real(kind=dp) :: a, b, c

    a = 4.4090680619790817d+002
    b = 1.0000000000000000d-004
    c = (a + b)

    print *, (c == (a + b))
end program

在 64 位 Manjaro Linux 和

上使用 gfortran 7.3.1 编译此程序时

gfortran -o a.out minimal_example.F90 && a.out

我实际上得到了输出T。但是，当使用

编译和执行 32 位可执行文件时

gfortran -m32 -o a.out minimal_example.F90 && a.out

结果是F。对我来说，存储加法的结果似乎稍微改变了它的值，因为差异 abs(c - (a + b)) 大致是 2.5E-014。我真的不明白为什么，因为所有变量都是同类的，所以临时 a + b 不应该具有相同的精度，因此适合 c 而没有任何转换错误？

在 a 和 b 的区间 [0,1) 中使用几个随机生成的值进行尝试，重复了这一观察。 64 位可执行文件中的比较始终是 .true.，而 25% 的 32 位可执行文件尝试结果是 .false..

这种行为的原因是什么？特别是，为什么 64 位和 32 位可执行文件之间存在差异？

Answer 1

首先，建议不要在实数上使用 ==（或 .eq.，用于怀旧 FORTRAN-programmers）。当您这样做时，编译器往往会打印警告（尝试编译器选项 -Wall for gfortran！）。

当然，无论如何，人们可能仍然想知道计算机内部发生了什么。 FORTRAN 的优势之一是，只要结果符合 FORTRAN 标准，编译器就可以自由地打乱计算、更改它们的顺序、optimize-out 某些变量等。正如@Eric Postpischil 指出的那样：可能发生的事情之一是双精度变量在计算过程中被转换为更高的精度，并且只有在计算完成时才转换回双精度。

在你的情况下，我的猜测是 (a+b) 是以更高的精度计算的，而 c 已经转换为双精度，因此是不一样的。我期望不同的编译器（ifort？PGI-compiler？）和不同的 compiler-options（-fpexact、-O3 等）有不同的行为。

简而言之，我建议使用

这样的函数进行测试

 function same(a,b) result(eq)
 implicit none
     real, intent(in) :: a, b
     logical :: eq
     real, parameter :: very_small = 1e-10 ! or another very small value
     eq = abs(a-b) < very_small * abs(a)
 end function same

暂时还不能解决这个问题，所以我在我的电脑上测试了几个编译器选项 lubuntu.

奇怪的是，-m32 似乎对结果影响不大：

gfortran -m32  compare_reals.f90 && ./a.out
  F
gfortran compare_reals.f90 && ./a.out
  F
gfortran -m32 -ffloat-store compare_reals.f90 && ./a.out
 T
gfortran -m32 -O3 compare_reals.f90 && ./a.out
 TFloating point comparisons
gfortran -ffloat-store compare_reals.f90 && ./a.out
 T
gfortran -O3 compare_reals.f90 && ./a.out
 T

从gfortran的在线文档中，我找到了一些资料我假设解释了观察结果：

-ffloat-store:

Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.

This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables

比较浮点加法的存储结果与 Fortran 中的临时结果

Comparing stored result of floating point addition to temporary in Fortran

floating-point

precision

fortran

gfortran