glsl 和 opencl 中的钳位功能如何工作?它使用创建分支吗?我应该避免使用它吗?
How does a clamp function in glsl and opencl work? does it use create branches? and should I avoid using it?
GLSL 和 OpenCL 都有一个钳位功能,如果值超过界限,就会将数字钳位到插入的上限或下限。如果我尝试在 C++ 中实现类似的东西,它看起来像下面的代码:
if(i < min){
i=min;
}else if(i > max){
i=max;
}
但是,这有多个分支路径,据我所知,这会大大降低 GPU 的速度,因为它们中的大多数都必须执行所有分支。
那么 GLSL/OpenCL 钳位是如何工作的,如果它使用分支,您会建议尽可能避免使用它吗?
如果查看GPU指令集架构的相关文档,例如here and here, you will find that GPUs generally have native support for min and max instructions. Even if they didn't have, conditionals on NVIDIA GPUs, for example, are based on predicated execution. Any reasonable compiler would turn your example above into conditional assignments rather than a fully-fledged branch (example here). Even on the CPU…
GLSL 和 OpenCL 都有一个钳位功能,如果值超过界限,就会将数字钳位到插入的上限或下限。如果我尝试在 C++ 中实现类似的东西,它看起来像下面的代码:
if(i < min){
i=min;
}else if(i > max){
i=max;
}
但是,这有多个分支路径,据我所知,这会大大降低 GPU 的速度,因为它们中的大多数都必须执行所有分支。
那么 GLSL/OpenCL 钳位是如何工作的,如果它使用分支,您会建议尽可能避免使用它吗?
如果查看GPU指令集架构的相关文档,例如here and here, you will find that GPUs generally have native support for min and max instructions. Even if they didn't have, conditionals on NVIDIA GPUs, for example, are based on predicated execution. Any reasonable compiler would turn your example above into conditional assignments rather than a fully-fledged branch (example here). Even on the CPU…