OpenGL barrier() 函数和发散流控制

Question

问题

是否允许使用 barrier()，但在发散流控制之外？

详情

在兼容 OpenGL 4.00 的计算着色器中，我正在做一些涉及发散（即非动态统一）分支语句的工作。稍后在同一个着色器中，出于内存访问的目的，我需要同步执行该工作组中的所有调用。这也有利于提高效率，因为我希望它们都回到彼此的同步状态。然而，在查阅了 khronos.org wiki 和 refpages 之后，我不清楚我想做的事情是否符合标准（无论它在实践中是否有效）。

从this page，我们看到：

Calls to barrier may not be placed within any control flow.

从this page，我们看到：

barrier() can be called from flow-control, but it can only be called from uniform flow control. All expressions that lead to the evaluation of a barrier() must be dynamically uniform.

第一，关于流量控制的明显矛盾。我假设只有发散（在工作组内）流量控制是不允许的？

其次，在我看来，关于在发散流控制之后发生的对 barrier() 的调用似乎有些含糊不清。重要的是，请再次注意 "All expressions that lead to the evaluation of a barrier() must be dynamically uniform." 这是否意味着导致或...？一些例子可以帮助说明我的困惑。

例1，有效：

void main() {
    // ... do some work here ...
    barrier(); // valid use case
    // ... do some more work ...
}

示例2，有效：

void main() {
    if (IS_BEST_WORK_GROUP) { // dynamically uniform within work group
        // ... do some work here ...
        barrier(); // valid use case
        // ... do some more work ...
    }
}

例3，无效：

void main() {
    if (IS_BEST_INVOCATION) { // divergent within work group
        // ... do some work here ...
        barrier(); // this is illegal
        // ... do some more work ...
    }
}

例子4，有歧义：

void main() {
    if (IS_BEST_INVOCATION) { // divergent within work group
        // ... do some work here ...
    }
    barrier(); // is this allowed?
                // it occurs after, but outside of, a divergent branch statement
    // ... do some more work ...
}

如果示例 4 确实有效，我应该在哪里以明确的形式找到此信息？
或者，如果它无效，是否有一些其他机制可以让我的所有调用在发散分支条件后回到锁定步骤？
如果以上都不是，这是否可以在 Vulkan 中完成？

Answer 1

唯一的声明源是 GLSL Specification。

第 8.16 节指出：

For tessellation control shaders, the barrier() function may only be placed inside the function main() of the tessellation control shader and may not be called within any control flow. [...]

For compute shaders, the barrier() function may be placed within flow control, but that flow control must be uniform flow control.

您的示例 4 非常好，因为对 barrier 的调用不在任何控制流中。只要确保所有着色器调用都达到相同的屏障，在屏障之前做什么并不重要。

Answer 2

来自 NV_compute_program5 扩展的一些额外的 Nvidia 特定信息：

Section 2.X.8.Z, BAR: Execution Barrier

The BAR instruction synchronizes the execution of compute shader invocations within a local work group. When a compute shader invocation executes the BAR instruction, it pauses until the same BAR instruction has been executed by all invocations in the current local work group. Once all invocations have executed the BAR instruction, processing continues with the instruction following the BAR instruction.

There is no compile-time restriction on the locations in a program where BAR is allowed. However, BAR instructions are not allowed in divergent flow control; if any compute shader invocation in the work group executes the BAR instruction, all compute shaders invocations must execute the instruction. Results of executing a BAR instruction are undefined and can result in application hangs and/or program termination if the instruction is issued:

inside any IF/ELSE/ENDIF block where the results of the condition evaluated by the IF instruction are not identical across the work group;

inside any iteration of REP/ENDREP block where at least one invocation in the work group has skipped to the next iteration using the CONT instruction, exited the loop using a BRK or RET instruction, or exited the loop due to having completed the requested number of loop iterations; or

inside any subroutine (including main) where at least one invocation in the work group has exited the subroutine using the RET instruction.

OpenGL barrier() 函数和发散流控制

OpenGL barrier() Function and Divergent Flow Control

opengl

shader

gpgpu

问题

详情

例1，有效：

示例2，有效：

例3，无效：

例子4，有歧义：