在 OpenCL 中将整数转换为浮点数
Casting an integer to a float in OpenCL
这是我第一次 post 堆栈溢出,请耐心等待。
我目前正在编写 OpenCL 内核,需要使用内置的 sqrt 函数。但是,要使其正常工作,函数的参数必须是浮点数。我目前有一个整数值,需要将其转换为浮点数,以便执行 sqrt() 函数,然后将其转换回整数,以便将其存储到“magOut”数组中。
下面的代码应该可以更好地理解我正在尝试做的事情:
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
如果需要了解所需的应用程序,这里是完整代码:
__kernel void matrixMultiplication(__global int* input, __global int* xConv, __global int* yConv, __global int* size, __global int* magOutput){
int workItemNum = get_global_id(0); //Work item ID
int workGroupNum = get_group_id(0); //Work group ID
int localGroupID = get_local_id(0); //Work items ID within each work group
// size refers to the total size of a matrix. So for a 3x3 size = 9
float dim = *size;
int dim1 = *size;
int row = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the row length
int current_row = workItemNum/dim; // the current row is calculated by using the current workitem number divided by the total size of the matrix
int col = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the column length
int current_col = workItemNum % dim1; // the current column is calculated by using the current workitem number modulus by the total size of the matrix
// printf("dimension: %i \n",localGroupID);
// This if statement excludes all boundary pixels from the calculation as you require the neighbouring pixel cells
// for this calculation
if (current_col == 0 || current_col == col-1 || current_row == 0 || current_row == row - 1){
/*===============================================================================================================
* The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
*
* [-1 0 +1]
* X - Directional Kernel = [-2 0 +2]
* [-1 0 +1]
*
* This scans across the X direction of the image and enhances all edges in the X-direction
* ===============================================================================================================
*/
xConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1
+ input[(current_col)*col + current_row - 1]*0
+ input[(current_col + 1)*col + current_row - 1]*1
+ input[(current_col - 1)*col + current_row]*-2
+ input[(current_col)*col + current_row]*0
+ input[(current_col + 1)*col + current_row]*2
+ input[(current_col - 1)*col + current_row + 1]*-1
+ input[(current_col)*col + current_row + 1]*0
+ input[(current_col + 1)*col + current_row + 1]*1;
/*===============================================================================================================
* The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
*
* [+1 +2 +1]
* Y - Directional Kernel = [ 0 0 0]
* [-1 -2 -1]
*
* This scans across the Y direction of the image and enhances all edges in the Y-direction
* ===============================================================================================================
*/
yConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1
+ input[(current_col)*col + current_row - 1]*-2
+ input[(current_col + 1)*col + current_row - 1]*-1
+ input[(current_col - 1)*col + current_row]*0
+ input[(current_col)*col + current_row]*0
+ input[(current_col + 1)*col + current_row]*0
+ input[(current_col - 1)*col + current_row + 1]*1
+ input[(current_col)*col + current_row + 1]*2
+ input[(current_col + 1)*col + current_row + 1]*1;
}
//===============================================================================================================
// Calculates the convolution matrix of the X and Y arrays. Does so by squaring each item of the X and Y arrays,
// adding them and taking the square root. This is the basic magnitude formula. This is done for by each workItem
//===============================================================================================================
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
}
有什么建议吗?
欢迎使用 Whosebug!
所以您基本上有一个 int
数据类型的二维向量,并且想要计算它的长度。大多数 OpenCL C 只是标准 C99 code/syntax,因此最直接的方法是使用标准 C-style 类型转换:
magOutput[workItemNum] = (int)(sqrt((float)xConv[workItemNum]*(float)xConv[workItemNum] + (float)yConv[workItemNum]*(float)yConv[workItemNum])+0.5f);
+0.5f
用于正确舍入:将 float
转换为 int
总是向下舍入,例如 (int)3.9f
将转换为 3
。通过在转换前立即添加 +0.5f
,结果会正确舍入 up/down。请注意,我首先转换为 float
然后进行平方;否则在乘法过程中可能会出现整数溢出。
一个可能更快的方法是这样的:我在这里只从全局内存加载值 xConv[workItemNum]
/yConv[workItemNum]
一次(这真的很慢),将它们转换为 float
和将它们存储在私有内存(寄存器)xConvf
/yConvf
中。然后我进行长度计算和舍入,然后将结果写回慢速全局内存中的 magOutput[workItemNum]
。
const float xConvf = (float)xConv[workItemNum], yConvf = (float)yConv[workItemNum];
magOutput[workItemNum] = (int)(sqrt(xConvf*xConvf + yConvf*yConvf)+0.5f);
如果你真的很喜欢 OpenCL C 中的 built-in 数学功能,你也可以这样做(应该和第二种方法一样快):
magOutput[workItemNum] = (int)(length(float2((float)xConv[workItemNum], (float)yConv[workItemNum]))+0.5f);
所以有 2 个要点给你:
- OpenCL 中的类型转换就像在 C 中一样工作:
(float)x
.
- 读取/写入全局内存(您的内核参数数组)真的很慢。只在局部变量(这些是私有内存space)中加载一次必要的值,以便在寄存器中获取数字,然后在寄存器中进行运算,然后写回全局内存一次。
虽然这不会影响您的 OpenCL C 代码,但我建议使用此轻量级 OpenCL-Wrapper 进行 C++ 开发。这将您的 CPU 代码中的 OpenCL 控制逻辑减少到大约 1/4,并使开发更加容易。
这是我第一次 post 堆栈溢出,请耐心等待。
我目前正在编写 OpenCL 内核,需要使用内置的 sqrt 函数。但是,要使其正常工作,函数的参数必须是浮点数。我目前有一个整数值,需要将其转换为浮点数,以便执行 sqrt() 函数,然后将其转换回整数,以便将其存储到“magOut”数组中。
下面的代码应该可以更好地理解我正在尝试做的事情:
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
如果需要了解所需的应用程序,这里是完整代码:
__kernel void matrixMultiplication(__global int* input, __global int* xConv, __global int* yConv, __global int* size, __global int* magOutput){
int workItemNum = get_global_id(0); //Work item ID
int workGroupNum = get_group_id(0); //Work group ID
int localGroupID = get_local_id(0); //Work items ID within each work group
// size refers to the total size of a matrix. So for a 3x3 size = 9
float dim = *size;
int dim1 = *size;
int row = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the row length
int current_row = workItemNum/dim; // the current row is calculated by using the current workitem number divided by the total size of the matrix
int col = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the column length
int current_col = workItemNum % dim1; // the current column is calculated by using the current workitem number modulus by the total size of the matrix
// printf("dimension: %i \n",localGroupID);
// This if statement excludes all boundary pixels from the calculation as you require the neighbouring pixel cells
// for this calculation
if (current_col == 0 || current_col == col-1 || current_row == 0 || current_row == row - 1){
/*===============================================================================================================
* The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
*
* [-1 0 +1]
* X - Directional Kernel = [-2 0 +2]
* [-1 0 +1]
*
* This scans across the X direction of the image and enhances all edges in the X-direction
* ===============================================================================================================
*/
xConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1
+ input[(current_col)*col + current_row - 1]*0
+ input[(current_col + 1)*col + current_row - 1]*1
+ input[(current_col - 1)*col + current_row]*-2
+ input[(current_col)*col + current_row]*0
+ input[(current_col + 1)*col + current_row]*2
+ input[(current_col - 1)*col + current_row + 1]*-1
+ input[(current_col)*col + current_row + 1]*0
+ input[(current_col + 1)*col + current_row + 1]*1;
/*===============================================================================================================
* The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
*
* [+1 +2 +1]
* Y - Directional Kernel = [ 0 0 0]
* [-1 -2 -1]
*
* This scans across the Y direction of the image and enhances all edges in the Y-direction
* ===============================================================================================================
*/
yConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1
+ input[(current_col)*col + current_row - 1]*-2
+ input[(current_col + 1)*col + current_row - 1]*-1
+ input[(current_col - 1)*col + current_row]*0
+ input[(current_col)*col + current_row]*0
+ input[(current_col + 1)*col + current_row]*0
+ input[(current_col - 1)*col + current_row + 1]*1
+ input[(current_col)*col + current_row + 1]*2
+ input[(current_col + 1)*col + current_row + 1]*1;
}
//===============================================================================================================
// Calculates the convolution matrix of the X and Y arrays. Does so by squaring each item of the X and Y arrays,
// adding them and taking the square root. This is the basic magnitude formula. This is done for by each workItem
//===============================================================================================================
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
}
有什么建议吗?
欢迎使用 Whosebug!
所以您基本上有一个 int
数据类型的二维向量,并且想要计算它的长度。大多数 OpenCL C 只是标准 C99 code/syntax,因此最直接的方法是使用标准 C-style 类型转换:
magOutput[workItemNum] = (int)(sqrt((float)xConv[workItemNum]*(float)xConv[workItemNum] + (float)yConv[workItemNum]*(float)yConv[workItemNum])+0.5f);
+0.5f
用于正确舍入:将 float
转换为 int
总是向下舍入,例如 (int)3.9f
将转换为 3
。通过在转换前立即添加 +0.5f
,结果会正确舍入 up/down。请注意,我首先转换为 float
然后进行平方;否则在乘法过程中可能会出现整数溢出。
一个可能更快的方法是这样的:我在这里只从全局内存加载值 xConv[workItemNum]
/yConv[workItemNum]
一次(这真的很慢),将它们转换为 float
和将它们存储在私有内存(寄存器)xConvf
/yConvf
中。然后我进行长度计算和舍入,然后将结果写回慢速全局内存中的 magOutput[workItemNum]
。
const float xConvf = (float)xConv[workItemNum], yConvf = (float)yConv[workItemNum];
magOutput[workItemNum] = (int)(sqrt(xConvf*xConvf + yConvf*yConvf)+0.5f);
如果你真的很喜欢 OpenCL C 中的 built-in 数学功能,你也可以这样做(应该和第二种方法一样快):
magOutput[workItemNum] = (int)(length(float2((float)xConv[workItemNum], (float)yConv[workItemNum]))+0.5f);
所以有 2 个要点给你:
- OpenCL 中的类型转换就像在 C 中一样工作:
(float)x
. - 读取/写入全局内存(您的内核参数数组)真的很慢。只在局部变量(这些是私有内存space)中加载一次必要的值,以便在寄存器中获取数字,然后在寄存器中进行运算,然后写回全局内存一次。
虽然这不会影响您的 OpenCL C 代码,但我建议使用此轻量级 OpenCL-Wrapper 进行 C++ 开发。这将您的 CPU 代码中的 OpenCL 控制逻辑减少到大约 1/4,并使开发更加容易。