将并行 CUDA 程序顺序转换为 运行
Converting parralel CUDA program to run sequentially
我有以下 CUDA 程序,可以将图像从 RGBA 并行转换为灰度。我还想有一个按顺序 运行 的版本,这样我就可以比较两者并获得诸如加速等指标。
据我了解,为了按顺序 运行,我需要以一种方式进行编辑,这意味着使用两个 for 循环(一个用于 X,一个用于 Y)逐个像素地遍历图像).在移动到下一个像素之前,灰度转换应该在像素上 运行。
__global__ void colorConvert(unsigned char * grayImage, unsigned char * rgbImage, unsigned int width, unsigned int height)
unsigned int x = threadIdx.x + blockIdx.x * blockDim.x;
//unsigned int y = threadIdx.y + blockIdx.y * blockDim.y; //this is needed if you use 2D grid and blocks
//if ((x < width) && (y < height)) {
//check if out of bounds
if ((x < width*height)) {
// get 1D coordinate for the grayscale image
unsigned int grayOffset = x;// y*width + x; //this is needed if you use 2D grid and blocks
// one can think of the RGB image having
// CHANNEL times columns than the gray scale image
unsigned int rgbOffset = grayOffset*CHANNELS;
unsigned char r = rgbImage[rgbOffset]; // red value for pixel
unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
// perform the rescaling and store it
// We multiply by floating point constants
grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;
我已经从问题中删除了我的其余代码,因为其中有很多代码也被浏览过。如果我想使用两个 for 循环遍历每个像素并将 grayImage[grayOffset]
行代码应用于每个像素,以顺序方式制作此内核 运行 我将如何去做?
您需要一个 for 循环,您的代码对所有图像像素使用一维数组,因此您只需要一个 for。
for(x=0; x<width*height; ++x)
unsigned int grayOffset = x;
unsigned int rgbOffset = grayOffset*CHANNELS;
unsigned char r = rgbImage[rgbOffset]; // red value for pixel
unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
// perform the rescaling and store it
// We multiply by floating point constants
grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;
我有以下 CUDA 程序,可以将图像从 RGBA 并行转换为灰度。我还想有一个按顺序 运行 的版本,这样我就可以比较两者并获得诸如加速等指标。
据我了解,为了按顺序 运行,我需要以一种方式进行编辑,这意味着使用两个 for 循环(一个用于 X,一个用于 Y)逐个像素地遍历图像).在移动到下一个像素之前,灰度转换应该在像素上 运行。
__global__ void colorConvert(unsigned char * grayImage, unsigned char * rgbImage, unsigned int width, unsigned int height)
unsigned int x = threadIdx.x + blockIdx.x * blockDim.x;
//unsigned int y = threadIdx.y + blockIdx.y * blockDim.y; //this is needed if you use 2D grid and blocks
//if ((x < width) && (y < height)) {
//check if out of bounds
if ((x < width*height)) {
// get 1D coordinate for the grayscale image
unsigned int grayOffset = x;// y*width + x; //this is needed if you use 2D grid and blocks
// one can think of the RGB image having
// CHANNEL times columns than the gray scale image
unsigned int rgbOffset = grayOffset*CHANNELS;
unsigned char r = rgbImage[rgbOffset]; // red value for pixel
unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
// perform the rescaling and store it
// We multiply by floating point constants
grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;
我已经从问题中删除了我的其余代码,因为其中有很多代码也被浏览过。如果我想使用两个 for 循环遍历每个像素并将 grayImage[grayOffset]
行代码应用于每个像素,以顺序方式制作此内核 运行 我将如何去做?
您需要一个 for 循环,您的代码对所有图像像素使用一维数组,因此您只需要一个 for。
for(x=0; x<width*height; ++x)
unsigned int grayOffset = x;
unsigned int rgbOffset = grayOffset*CHANNELS;
unsigned char r = rgbImage[rgbOffset]; // red value for pixel
unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
// perform the rescaling and store it
// We multiply by floating point constants
grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;