Cuda浮点精度

Question

当数字足够小时，浮点数是否会失去精度（GPU 内核，RTX 2060）？

说，

#include <stdio.h>

__global__ void calc()
{
    float m1 = 0.1490116119;
    float m2 = -0.000000007450580;
    float res = m1 + m2;

    printf("M1: %.15f  M2: %.15f Res: %.15f\n", m1, m2, res);

}

int main()
{
    calc<<<1,1>>>();
    cudaDeviceSynchronize();

    return 0;
}

结果为：

M1: 0.149011611938477  M2: -0.000000007450580 Res: 0.149011611938477

M2在这种情况下无关紧要，这让我很困惑。

我知道浮点数可能没有想象中的那么准确，但是我想知道是不是哪里错了，1e-9顺序的浮点数被忽略了。

假设如果我想逐渐将一个非常小的数字添加到给定变量（称为 M），该基准意味着 M 终于不改了吗？

非常感谢！

Answer 1

您的观察是正确的，与 CUDA 没有任何关系：

$ cat t1981.cu
#include <stdio.h>
#ifndef DONT_USE_CUDA
__global__ void calc()
#else
int main()
#endif
{
    float m1 = 0.1490116119;
    float m2 = -0.000000007450580;
    float res = m1 + m2;

    printf("M1: %.15f  M2: %.15f Res: %.15f\n", m1, m2, res);

}
#ifndef DONT_USE_CUDA
int main()
{
    calc<<<1,1>>>();
    cudaDeviceSynchronize();

    return 0;
}
#endif
$ nvcc -o t1981 t1981.cu
$ ./t1981
M1: 0.149011611938477  M2: -0.000000007450580 Res: 0.149011611938477
$ cp t1981.cu t1981.cpp
$ g++ -DDONT_USE_CUDA t1981.cpp -o t1981
$ ./t1981
M1: 0.149011611938477  M2: -0.000000007450580 Res: 0.149011611938477
$

M2 does not matter in this case, which confuses me.

一个float个数量对应IEEE 754个32位浮点数。此数字表示有 ~23 位用于尾数表示。 23 位对应于大约 6 或 7 个十进制数字。我喜欢考虑的方式是，在您要表示的最高有效数字和您要表示的最低有效数字（含）之间可以有大约 6 或 7 个十进制数字。

I know that float number may not be as accurate as expected, but I wonder if there is something wrong that float in the order of 1e-9 is ignored.

不是那么肯定，1e-9数量级的数被忽略了。相反，最好得出结论，如果您关心的数字中的数字之间有 9 位有效数字，则 float 可能表示不充分。您可以轻松地处理 1e-9 范围内的数字，只是将这些数字与 1e-1 范围内的数字结合起来可能没有好的结果，这正是您想要做的。

Assume that if I want to add a very small number to a given variable (called M) gradually, this benchmark would imply that M will not change any more finally?

是的，如果您想像这样处理计算，一种可能的解决方案是从 float 切换到 double。 double 通常可以处理 12-15 或更多有效的小数位。

还有其他替代方法，例如kahan summations，可以解决此类问题。您还可以对要添加的一组数字进行排序。在串行方式中，或在 CUDA 中按块进行，从最小到最大相加可能会比未排序或简单求和得到更好的结果。

另请注意，典型的 parallel reduction effectively performs the pairwise summation approach discussed here 可能比简单的串行 running-sum 方法执行得更好。

Cuda浮点精度

Cuda float precision

c++

floating-point

precision

double

cuda