为什么在使用 CUDA 时要使用 memset?
Why use memset when using CUDA?
我在 CUDA 代码示例中看到 memset
用于将向量初始化为全 0,这将存储另外两个向量的总和。例如:
hostRef = (float *)malloc(nBytes);
gpuRef = (float *)malloc(nBytes);
memset(hostRef, 0, nBytes);
memset(gpuRef, 0, nBytes);
如果不对这些向量做任何其他事情,这有什么用?
您可以在此处查看代码:
https://books.google.com/books?id=Jgx_BAAAQBAJ&pg=PA42#v=onepage&q&f=false
不确定 link 能工作多久。
当你使用'malloc'获取内存时,它不一定是空的,只有'calloc'会为你清零内存。出于理智和调试目的,建议初始化您的内存。
如果不对这些向量做任何其他事情,那将毫无用处,但事实并非如此。
代码运行一个CUDA向量求和,然后将结果复制到*gpuRef
。然后它在主机 CPU 上执行相同的求和,并将结果放入 *hostRef
。最后,它比较了两个结果。
当然,在将新数据复制到其中之前,它不会对任何一个数组执行任何操作,因此初始化为零仍然没有用。
这是njuffa在评论中给出的答案:
...The content of GPU memory doesn't change between invocations of
the application. In case of a program failure, we would want to avoid
picking up good data from a previous run, which may lead (erroneously)
to a belief that the program executed fine. I have seen such cases in
real-life, and it was very confusing to the affected programmers. Thus
it is better to initialize result data to a known value, although I
would have chosen 0xff instead of 0 as this corresponds to a NaN
pattern for floating-point data.
我在 CUDA 代码示例中看到 memset
用于将向量初始化为全 0,这将存储另外两个向量的总和。例如:
hostRef = (float *)malloc(nBytes);
gpuRef = (float *)malloc(nBytes);
memset(hostRef, 0, nBytes);
memset(gpuRef, 0, nBytes);
如果不对这些向量做任何其他事情,这有什么用?
您可以在此处查看代码: https://books.google.com/books?id=Jgx_BAAAQBAJ&pg=PA42#v=onepage&q&f=false
不确定 link 能工作多久。
当你使用'malloc'获取内存时,它不一定是空的,只有'calloc'会为你清零内存。出于理智和调试目的,建议初始化您的内存。
如果不对这些向量做任何其他事情,那将毫无用处,但事实并非如此。
代码运行一个CUDA向量求和,然后将结果复制到*gpuRef
。然后它在主机 CPU 上执行相同的求和,并将结果放入 *hostRef
。最后,它比较了两个结果。
当然,在将新数据复制到其中之前,它不会对任何一个数组执行任何操作,因此初始化为零仍然没有用。
这是njuffa在评论中给出的答案:
...The content of GPU memory doesn't change between invocations of the application. In case of a program failure, we would want to avoid picking up good data from a previous run, which may lead (erroneously) to a belief that the program executed fine. I have seen such cases in real-life, and it was very confusing to the affected programmers. Thus it is better to initialize result data to a known value, although I would have chosen 0xff instead of 0 as this corresponds to a NaN pattern for floating-point data.