CUDA - 大数和内存分配
CUDA - big number and memory allocation
我的程序中有一个非常奇怪的错误。我花了很多时间,但我还没有找到解决办法。我编写了简单的程序来重现我的问题。也许有人可以帮助我。我尝试了 cuda-memcheck & What is the canonical way to check for errors using the CUDA runtime API? 但我没有收到任何错误。
详情:
nvcc 版本 - V6.0.1
gcc 版本 - 4.8.1
完整代码:
#include <stdio.h>
__constant__ unsigned long long int bigNumber = 83934243334343;
__device__ bool isFound = false;
__global__ void kernel(int *dev_number) {
unsigned long long int id = threadIdx.x + (blockIdx.x * blockDim.x);
while (id < bigNumber && isFound==false) {
if(id == 10) {
*dev_number = 4;
isFound=true;
}
id++;
}
}
int main(int argc, char *argv[]) {
int number = 0;
int *dev_number;
printf("Number: %d\n", number);
return 0;
}
编译和运行:
nvcc myprogram.cu
./myprogram
当我 运行 这个程序时,我没有得到任何 return 值。但是当变量 - bigNumber 具有较小的值或者我不使用 cudaMalloc & cudaMemcpy 时它起作用(这意味着 return 0 被调用)。什么连接必须为另一个具有常量 bigNumber 的变量分配内存?有什么问题?
现在您已经将代码修改为更合理的内容,我通过以下修改立即得到结果:
__device__ volatile bool isFound = false;
volatile
限定符强制编译器忽略任何会阻止每个线程读取变量的 global 副本的优化。
The compiler is free to optimize reads and writes to global or shared memory (for example, by caching global reads into registers or L1 cache) as long as it respects the memory ordering semantics of memory fence functions (Memory Fence Functions) and memory visibility semantics of synchronization functions (Synchronization Functions).
These optimizations can be disabled using the volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler assumes that its value can be changed or used at any time by another thread and therefore any reference to this variable compiles to an actual memory read or write instruction.
如果您没有使用 volatile
限定符,那么只有一个线程采用提前退出条件 (isFound
),所有其他线程必须循环 很长时间时间 直到他们的 id
值超过 bigNumber
我的程序中有一个非常奇怪的错误。我花了很多时间,但我还没有找到解决办法。我编写了简单的程序来重现我的问题。也许有人可以帮助我。我尝试了 cuda-memcheck & What is the canonical way to check for errors using the CUDA runtime API? 但我没有收到任何错误。
详情:
nvcc 版本 - V6.0.1
gcc 版本 - 4.8.1
完整代码:
#include <stdio.h>
__constant__ unsigned long long int bigNumber = 83934243334343;
__device__ bool isFound = false;
__global__ void kernel(int *dev_number) {
unsigned long long int id = threadIdx.x + (blockIdx.x * blockDim.x);
while (id < bigNumber && isFound==false) {
if(id == 10) {
*dev_number = 4;
isFound=true;
}
id++;
}
}
int main(int argc, char *argv[]) {
int number = 0;
int *dev_number;
printf("Number: %d\n", number);
return 0;
}
编译和运行:
nvcc myprogram.cu
./myprogram
当我 运行 这个程序时,我没有得到任何 return 值。但是当变量 - bigNumber 具有较小的值或者我不使用 cudaMalloc & cudaMemcpy 时它起作用(这意味着 return 0 被调用)。什么连接必须为另一个具有常量 bigNumber 的变量分配内存?有什么问题?
现在您已经将代码修改为更合理的内容,我通过以下修改立即得到结果:
__device__ volatile bool isFound = false;
volatile
限定符强制编译器忽略任何会阻止每个线程读取变量的 global 副本的优化。
The compiler is free to optimize reads and writes to global or shared memory (for example, by caching global reads into registers or L1 cache) as long as it respects the memory ordering semantics of memory fence functions (Memory Fence Functions) and memory visibility semantics of synchronization functions (Synchronization Functions).
These optimizations can be disabled using the volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler assumes that its value can be changed or used at any time by another thread and therefore any reference to this variable compiles to an actual memory read or write instruction.
如果您没有使用 volatile
限定符,那么只有一个线程采用提前退出条件 (isFound
),所有其他线程必须循环 很长时间时间 直到他们的 id
值超过 bigNumber