GLSL memoryBarrierShared() 有什么用?
GLSL memoryBarrierShared() usefulness?
我想知道 memoryBarrierShared 的用处。
确实,当我查看屏障功能的文档时:我读到:
For any given static instance of barrier in a compute shader, all invocations within a single work group must enter it before any are allowed to continue beyond it. This ensures that values written by one invocation prior to a given static instance of barrier can be safely read by other invocations after their call to the same static instance of barrier. Because invocations may execute in undefined order between these barrier calls, the values of a per-vertex or per-patch output variable, or any shared variable will be undefined in a number of cases.
所以,如果我们可以在使用屏障后安全地读取值,为什么我们会在某些代码中看到
memoryBarrierShared();
barrier();
或者像
这样的错误
barrier();
memoryBarrierShared();
所以,我的问题是:如果使用屏障就足够了,memoryBarrier{Shared,...} 的目的是什么?
对于memoryBarrierBuffer/Image如果我们使用多阶段我能理解,但是对于共享,我不知道...
更新(2019-12-07):
下面的 GLSL 4.60 说明现在是 wrong. After Revision 5, the GLSL 4.60 spec 现在是:
Private GLSL issue #24: Clarify that barrier()
by itself is enough to synchronize both control flow and memory accesses to shared variables and tessellation control output variables. For other memory accesses an additional memory barrier is still required.
这也反映在 GLSL ES 3.20 documentation:
In order to achieve ordering with respect to reads and writes to shared variables, control flow barriers must be employed using the barrier()
function (see “Shader Invocation Control Functions”).
他们还进一步解释
A barrier()
affects control flow but only synchronizes memory accesses to shared variables and tessellation control output variables. For other memory accesses, it does not ensure that values written by one invocation prior to a given static instance of barrier()
can be safely read by other invocations after their call to the same static instance of barrier()
. To achieve this requires the use of both barrier()
and a memory barrier.
TL;DR:如果你只对共享变量使用屏障,barrier()
就足够了。如果您将它们用于 "other memory accesses",那么 barrier()
是不够的。
GLSL 4.60 澄清了这一点:
In order to achieve ordering with respect to reads and writes to shared variables, a combination of control flow and memory barriers must be employed using the barrier()
and memoryBarrier()
functions (see “Shader Invocation Control Functions”).
最好对待桌面 GLSL,就好像它总是这样说一样。尽管以下是 GLSL 4.50 中的说明。
GLSL 4.50 非常清楚显式内存屏障是不必要的。计算着色器中的 barrier
包括所有内存屏障。
然而,GLSL ES 3.20 同样清楚地表明 barrier
不 包含任何类型的内存屏障:
For compute shaders, a barrier only affects control flow and does not by itself synchronize memory accesses. In particular, it does not ensure that values written by one invocation prior to a given static instance of barrier()
can be safely read by other invocations after their call to the same static instance of barrier()
. To achieve this requires the use of both
barrier()
and a memory barrier.
值得注意的是,离线 glslang 编译器将始终 使用 GLSL ES 措辞。因此,如果您要生成 SPIR-V 以馈入 Vulkan,则必须在此处遵循 ES 的规则。嗯,until they get that fixed, one way or another.
话虽这么说,ES 的措辞更有意义,因为 everything 的完整内存屏障非常昂贵。特别是如果您只想同步访问共享变量。
我建议在调用 barrier
的同时使用内存屏障。这样,您的着色器将是正确的,即使它在某些实现上可能稍微慢一些。但是,如果您打算将内存屏障与 barrier
调用一起使用,那么内存屏障 必须首先出现 。同步执行后执行内存屏障是不正确的。
我想知道 memoryBarrierShared 的用处。
确实,当我查看屏障功能的文档时:我读到:
For any given static instance of barrier in a compute shader, all invocations within a single work group must enter it before any are allowed to continue beyond it. This ensures that values written by one invocation prior to a given static instance of barrier can be safely read by other invocations after their call to the same static instance of barrier. Because invocations may execute in undefined order between these barrier calls, the values of a per-vertex or per-patch output variable, or any shared variable will be undefined in a number of cases.
所以,如果我们可以在使用屏障后安全地读取值,为什么我们会在某些代码中看到
memoryBarrierShared();
barrier();
或者像
这样的错误barrier();
memoryBarrierShared();
所以,我的问题是:如果使用屏障就足够了,memoryBarrier{Shared,...} 的目的是什么?
对于memoryBarrierBuffer/Image如果我们使用多阶段我能理解,但是对于共享,我不知道...
更新(2019-12-07):
下面的 GLSL 4.60 说明现在是 wrong. After Revision 5, the GLSL 4.60 spec 现在是:
Private GLSL issue #24: Clarify that
barrier()
by itself is enough to synchronize both control flow and memory accesses to shared variables and tessellation control output variables. For other memory accesses an additional memory barrier is still required.
这也反映在 GLSL ES 3.20 documentation:
In order to achieve ordering with respect to reads and writes to shared variables, control flow barriers must be employed using the
barrier()
function (see “Shader Invocation Control Functions”).
他们还进一步解释
A
barrier()
affects control flow but only synchronizes memory accesses to shared variables and tessellation control output variables. For other memory accesses, it does not ensure that values written by one invocation prior to a given static instance ofbarrier()
can be safely read by other invocations after their call to the same static instance ofbarrier()
. To achieve this requires the use of bothbarrier()
and a memory barrier.
TL;DR:如果你只对共享变量使用屏障,barrier()
就足够了。如果您将它们用于 "other memory accesses",那么 barrier()
是不够的。
GLSL 4.60 澄清了这一点:
In order to achieve ordering with respect to reads and writes to shared variables, a combination of control flow and memory barriers must be employed using the
barrier()
andmemoryBarrier()
functions (see “Shader Invocation Control Functions”).
最好对待桌面 GLSL,就好像它总是这样说一样。尽管以下是 GLSL 4.50 中的说明。
GLSL 4.50 非常清楚显式内存屏障是不必要的。计算着色器中的 barrier
包括所有内存屏障。
然而,GLSL ES 3.20 同样清楚地表明 barrier
不 包含任何类型的内存屏障:
For compute shaders, a barrier only affects control flow and does not by itself synchronize memory accesses. In particular, it does not ensure that values written by one invocation prior to a given static instance of
barrier()
can be safely read by other invocations after their call to the same static instance ofbarrier()
. To achieve this requires the use of bothbarrier()
and a memory barrier.
值得注意的是,离线 glslang 编译器将始终 使用 GLSL ES 措辞。因此,如果您要生成 SPIR-V 以馈入 Vulkan,则必须在此处遵循 ES 的规则。嗯,until they get that fixed, one way or another.
话虽这么说,ES 的措辞更有意义,因为 everything 的完整内存屏障非常昂贵。特别是如果您只想同步访问共享变量。
我建议在调用 barrier
的同时使用内存屏障。这样,您的着色器将是正确的,即使它在某些实现上可能稍微慢一些。但是,如果您打算将内存屏障与 barrier
调用一起使用,那么内存屏障 必须首先出现 。同步执行后执行内存屏障是不正确的。