为什么 cuda-memcheck racecheck 会报告 cufft 的错误?
Why does cuda-memcheck racecheck report errors with cufft?
racecheck 工具报告了我的应用程序的内存竞争。我已将它与 CUFFT exec 函数隔离开来。
我是不是做错了什么?如果没有,我怎样才能让 racecheck 忽略它?
这是一个最小的例子,当 运行 in cuda-memcheck --tool racecheck
产生一堆 'hazards' like
========= Race reported between Write access at 0x00000a30 in void spVector0128C::kernelTex<unsigned int, float, fftDirection_t=-1, unsigned int=8, unsigned int=8, LUT, ALL, WRITEBACK>(kernel_parameters_t<fft_tex_t, unsigned int, float>)
========= and Read access at 0x00000a70 in void spVector0128C::kernelTex<unsigned int, float, fftDirection_t=-1, unsigned int=8, unsigned int=8, LUT, ALL, WRITEBACK>(kernel_parameters_t<fft_tex_t, unsigned int, float>) [4 hazards]
例子
#include <cufft.h>
#include <iostream>
#define ck(cmd) if ( cmd) { std::cerr << "error at line " << __LINE__ << std::endl;exit(1);}
int main(int argc,char ** argv)
{
int nfft=128;
cufftComplex * ibuf;
cufftComplex * obuf;
ck( cudaMalloc((void**)&ibuf, sizeof(cufftComplex)*nfft) );
ck( cudaMalloc((void**)&obuf, sizeof(cufftComplex)*nfft) );
ck( cudaMemset( ibuf,0,sizeof(cufftComplex)*nfft) );
cufftHandle fft;
ck( cufftPlanMany(&fft,1,&nfft,
NULL,1,nfft,
NULL,1,nfft,
CUFFT_C2C,1) );
ck( cufftExecC2C(fft,ibuf,obuf,CUFFT_FORWARD) );
ck( cudaDeviceSynchronize() );
cufftDestroy( fft );
ck(cudaFree(ibuf));
ck(cudaFree(obuf));
return 0;
}
你没有做错任何事。我不认为它可以像 nvprof 一样被禁用 - cudaProfilerStart/cudaProfilerStop
请注意 __syncthreads 和 BAR.SYNC 指令描述之间的细微差异:
__syncthreads - http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#synchronization-functions
waits until all threads in the thread block have reached this point
BAR.SYNC - http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-instructions
Barriers are executed on a per-warp basis as if all the threads in a
warp are active."
这不是完全相同的行为。 cuda-memcheck racecheck 可能遵循 __syncthreads 定义和 cuFFT 内核 BAR.SYNC 一个
这很可能会在下一个版本中得到修复。
racecheck 工具报告了我的应用程序的内存竞争。我已将它与 CUFFT exec 函数隔离开来。
我是不是做错了什么?如果没有,我怎样才能让 racecheck 忽略它?
这是一个最小的例子,当 运行 in cuda-memcheck --tool racecheck
产生一堆 'hazards' like
========= Race reported between Write access at 0x00000a30 in void spVector0128C::kernelTex<unsigned int, float, fftDirection_t=-1, unsigned int=8, unsigned int=8, LUT, ALL, WRITEBACK>(kernel_parameters_t<fft_tex_t, unsigned int, float>)
========= and Read access at 0x00000a70 in void spVector0128C::kernelTex<unsigned int, float, fftDirection_t=-1, unsigned int=8, unsigned int=8, LUT, ALL, WRITEBACK>(kernel_parameters_t<fft_tex_t, unsigned int, float>) [4 hazards]
例子
#include <cufft.h>
#include <iostream>
#define ck(cmd) if ( cmd) { std::cerr << "error at line " << __LINE__ << std::endl;exit(1);}
int main(int argc,char ** argv)
{
int nfft=128;
cufftComplex * ibuf;
cufftComplex * obuf;
ck( cudaMalloc((void**)&ibuf, sizeof(cufftComplex)*nfft) );
ck( cudaMalloc((void**)&obuf, sizeof(cufftComplex)*nfft) );
ck( cudaMemset( ibuf,0,sizeof(cufftComplex)*nfft) );
cufftHandle fft;
ck( cufftPlanMany(&fft,1,&nfft,
NULL,1,nfft,
NULL,1,nfft,
CUFFT_C2C,1) );
ck( cufftExecC2C(fft,ibuf,obuf,CUFFT_FORWARD) );
ck( cudaDeviceSynchronize() );
cufftDestroy( fft );
ck(cudaFree(ibuf));
ck(cudaFree(obuf));
return 0;
}
你没有做错任何事。我不认为它可以像 nvprof 一样被禁用 - cudaProfilerStart/cudaProfilerStop
请注意 __syncthreads 和 BAR.SYNC 指令描述之间的细微差异:
__syncthreads - http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#synchronization-functions
waits until all threads in the thread block have reached this point
BAR.SYNC - http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-instructions
Barriers are executed on a per-warp basis as if all the threads in a warp are active."
这不是完全相同的行为。 cuda-memcheck racecheck 可能遵循 __syncthreads 定义和 cuFFT 内核 BAR.SYNC 一个
这很可能会在下一个版本中得到修复。