ICC 是否满足复数乘法的 C99 规范？

Question

考虑这个简单的代码：

#include <complex.h>
complex float f(complex float x) {
  return x*x;
}

如果您使用英特尔编译器 -O3 -march=core-avx2 -fp-model strict 编译它，您将得到：

f:
        vmovsldup xmm1, xmm0                                    #3.12
        vmovshdup xmm2, xmm0                                    #3.12
        vshufps   xmm3, xmm0, xmm0, 177                         #3.12
        vmulps    xmm4, xmm1, xmm0                              #3.12
        vmulps    xmm5, xmm2, xmm3                              #3.12
        vaddsubps xmm0, xmm4, xmm5                              #3.12
        ret

这比您从 gcc 和 clang 获得的代码简单得多，也比您在网上找到的复数乘法代码简单得多。例如，它不会显式地处理复杂的 NaN 或无穷大。

Does this assembly meet the specs for C99 complex multiplication?

Answer 1

我从 -O2 -march=core-avx2 -ffast-math 的 clang 3.8 获得了相似但不相同的代码：我对 latter-day x86 浮点功能不是很熟悉，但我认为它在做同样的事情计算但使用不同的指令在寄存器中随机排列值。

f:
        vmovshdup       %xmm0, %xmm1    # xmm1 = xmm0[1,1,3,3]
        vaddss  %xmm0, %xmm0, %xmm2
        vmulss  %xmm2, %xmm1, %xmm2
        vmulss  %xmm1, %xmm1, %xmm1
        vfmsub231ss     %xmm0, %xmm0, %xmm1
        vinsertps       , %xmm2, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm2[0],xmm1[2,3]
        retq

GCC 6.3，具有相同的选项，似乎再次进行相同的计算，但以第三种方式改变值：

f:
        vmovq   %xmm0, -8(%rsp)
        vmovss  -4(%rsp), %xmm2
        vmovss  -8(%rsp), %xmm0
        vmulss  %xmm2, %xmm2, %xmm1
        vfmsub231ss     %xmm0, %xmm0, %xmm1
        vmulss  %xmm2, %xmm0, %xmm0
        vmovss  %xmm1, -16(%rsp)
        vaddss  %xmm0, %xmm0, %xmm0
        vmovss  %xmm0, -12(%rsp)
        vmovq   -16(%rsp), %xmm0
        ret

如果没有 -ffast-math，两个编译器都会生成完全不同的代码，似乎至少会检查 NaN。

我由此得出结论，即使使用 -fp-model strict，英特尔的编译器 也不会 生成完全 IEEE-compliant 的复数乘法。可能还有其他一些 command-line 开关可以使它生成完整的 IEEE-compliant 代码。

这是否违反了 C99 取决于英特尔的编译器是否被记录为符合附件 F 和 G（其中指定了 C 的实现意味着什么）提供 IEEE-compliant 实数和复数算术），如果是，您必须提供哪些 command-line 选项才能获得一致模式。

Answer 2

密码是non-conforming.

附件 G，第 5.1 节，第 4 段改为

The * and / operators satisfy the following infinity properties for all real, imaginary, and complex operands:

— if one operand is an infinity and the other operand is a nonzero finite number or an infinity, then the result of the * operator is an infinity;

所以如果 z = a * ib 是无限的并且 w = c * id是无限的，数z * w一定是无限的。

同一附件第 3 节第 1 段定义了无限复数的含义：

A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).

所以 z 是无限的，如果 a 或 b 是。
这确实是一个明智的选择，因为它反映了数学框架¹.

然而如果我们让z = ∞ + i∞（无限值）和w = i∞（和无限值）英特尔代码的结果是 z * w = NaN + iNaN 由于∞·0个中间体².

这足以将其标记为 non-conforming。

我们可以通过查看第一个引用的脚注（此处未报告脚注）进一步确认这一点，它提到了 CX_LIMITED_RANGE pragma 指令。

第 7.3.4 节第 1 段改为

The usual mathematical formulas for complex multiply, divide, and absolute value are problematic because of their treatment of infinities and because of undue overflow and underflow. The CX_LIMITED_RANGE pragma can be used to inform the implementation that (where the state is ‘‘on’’) the usual mathematical formulas [that produces NaNs] are acceptable.

标准委员会正在努力减轻复杂乘法（和除法）的巨大工作量。
In fact GCC has a flag to control this behaviour:

-fcx-limited-range
When enabled, this option states that a range reduction step is not needed when performing complex division.

Also, there is no checking whether the result of a complex multiplication or division is NaN + I*NaN, with an attempt to rescue the situation in that case.

The default is -fno-cx-limited-range, but is enabled by -ffast-math.
This option controls the default setting of the ISO C99 CX_LIMITED_RANGE pragma.

只有这个选项 makes GCC generate slow code and additional checks，没有它，它生成的代码与 Intel 的代码有相同的缺陷（我将源代码翻译成 C++）

f(std::complex<float>):
        movq    QWORD PTR [rsp-8], xmm0
        movss   xmm0, DWORD PTR [rsp-8]
        movss   xmm2, DWORD PTR [rsp-4]
        movaps  xmm1, xmm0
        movaps  xmm3, xmm2
        mulss   xmm1, xmm0
        mulss   xmm3, xmm2
        mulss   xmm0, xmm2
        subss   xmm1, xmm3
        addss   xmm0, xmm0
        movss   DWORD PTR [rsp-16], xmm1
        movss   DWORD PTR [rsp-12], xmm0
        movq    xmm0, QWORD PTR [rsp-16]
        ret

没有它代码是

f(std::complex<float>):
        sub     rsp, 40
        movq    QWORD PTR [rsp+24], xmm0
        movss   xmm3, DWORD PTR [rsp+28]
        movss   xmm2, DWORD PTR [rsp+24]
        movaps  xmm1, xmm3
        movaps  xmm0, xmm2
        call    __mulsc3
        movq    QWORD PTR [rsp+16], xmm0
        movss   xmm0, DWORD PTR [rsp+16]
        movss   DWORD PTR [rsp+8], xmm0
        movss   xmm0, DWORD PTR [rsp+20]
        movss   DWORD PTR [rsp+12], xmm0
        movq    xmm0, QWORD PTR [rsp+8]
        add     rsp, 40
        ret

和 __mulsc3 function 实际上与标准 C99 推荐的复数乘法相同。
它包括上述检查。

¹ 其中一个数的模数是从实际情况扩展而来的 |z|到复数 ‖z‖，由于无界限制而保持无限的定义。简单的说，复平面上有一圈无穷大的值，只需要一个"coordinate"无穷大就可以得到无穷大的模。

² 如果我们记住 z = NaN + i∞，情况会变得更糟或 z = ∞ + iNaN 是有效的无限值

ICC 是否满足复数乘法的 C99 规范？

Does ICC satisfy C99 specs for multiplication of complex numbers?

c

assembly

icc

avx

complex-numbers