为什么 Python 和 CUDA 不支持半精度复数浮点运算？

Why is half-precision complex float arithmetic not supported in Python and CUDA?

NumPY 有 complex64 对应两个 float32。

但它也有 float16 但没有 complex32。

怎么会？我有涉及 FFT 的信号处理计算，我认为我可以使用 complex32，但我不知道如何到达那里。特别是我希望使用 cupy.

在 NVidia GPU 上加速

不过，似乎 float16 在 GPU 上 slower 而不是更快。

为什么不支持半精度 and/or 被忽略了？

同样相关的是为什么我们没有 complex integers, as this may also present an opportunity for speedup。

这个问题已经在 CuPy 仓库中提出了一段时间：

https://github.com/cupy/cupy/issues/3370

但还没有具体的工作计划；大多数东西还是探索性的。

计算起来并非易事的原因之一是没有 numpy.complex32 dtype 我们可以直接导入（请注意，所有 CuPy 的 dtype 只是 NumPy 的别名），所以会有问题当询问设备主机传输时。另一件事是 CPU 或 GPU 上没有为 complex32 编写的原生数学函数，所以我们需要自己编写它们来进行转换、ufunc 等等。在 linked 问题中有一个 link 到 NumPy 的讨论，我的印象是它目前没有被考虑......

为什么 Python 和 CUDA 不支持半精度复数浮点运算？

Why is half-precision complex float arithmetic not supported in Python and CUDA?

numpy

fft

cupy

half-precision-float