GLSL 内存屏障()

GLSL memoryBarrier()

OpenGL 红皮书第 8 版(GL 4.3)示例 11.19 在 while 循环中放置了一个 imageLoad(),不断轮询,直到前一个基元的至少一个片段更新了该值。这本书说

Example 11.19 shows a very simple use case for memory barriers. It allows some level of ordering between fragments to be ensured. At the top of functionUsingBarriers(), a simple loop is used to wait for the contents of a memory location to reach our current primitive ID. Because we know that no two fragments from the same primitive can land on the same pixel, we know that when we’re executing the code in the body of the function, at least one fragment from the previous primitive has been processed. We then go about modifying the contents of memory at our fragment’s location using nonatomic operations. We signal to other shader invocations that we are done by writing to the shared memory location originally polled at the top of the function.

To ensure that our modified image contents are written back to memory before other shader invocations start into the body of the function, we use a call to memoryBarrier between updates of the color image and the primitive counter to enforce ordering.

但是,GL 规范 4.3 说

having one invocation poll memory written by another invocation assumes that the other invocation has been launched and can complete its writes

那么我们如何确保前一个原语的片段调用已经启动并完成了它的写入?

Post源代码

#version 420 core

layout (rgba32f} uniform coherent image2D my_image;

// Declaration of function
void functionUsingBarriers(coherent uimageBuffer i)

{

    uint val;

    // This loop essentially waits until at least one fragment from
    // an earlier primitive (that is, one with gl_PrimitiveID - 1)
    // has reached the end of this function point. Note that this is
    // not a robust loop as not every primitive will generate
    // fragments.
    do
    {
        val = imageLoad(i, 0).x;
    } while (val != gl_PrimitiveID);

    // At this point, we can load data from another global image
    vec4 frag = imageLoad(my_image, gl_FragCoord.xy);

    // Operate on it...
    frag *= 0.1234;
    frag = pow(frag, 2.2);

    // Write it back to memory
    imageStore(my_image, gl_FragCoord.xy, frag);

    // Now, we’re about to signal that we’re done with processing
    // the pixel. We need to ensure that all stores thus far have
    // been posted to memory. So, we insert a memory barrier.
    memoryBarrier();

    // Now we write back into the original "primitive count" memory
    // to signal that we have reached this point. The stores
    // resulting from processing "my_image" will have reached memory
    // before this store is committed due to the barrier.
    imageStore(i, 0, gl_PrimitiveID + 1);

    // Now issue another barrier to ensure that the results of the
    // image store are committed to memory before this shader
    // invocation ends.
    memoryBarrier();
}

此代码(及其附带的文本)是错误的废话。考虑以下语句:

Because we know that no two fragments from the same primitive can land on the same pixel, we know that when we’re executing the code in the body of the function, at least one fragment from the previous primitive has been processed.

即使我们假设网格中的图元不重叠(一般来说这几乎不是一个合理的假设),这也恰恰意味着 nothing 关于 GPU 在图元之间的工作分配。

OpenGL 规范明确说明了这一点:

The relative order of invocations of the same shader type are undefined. A store issued by a shader when working on primitive B might complete prior to a store for primitive A, even if primitive A is specified prior to primitive B. This applies even to fragment shaders; while fragment shader outputs are always written to the framebuffer in primitive order, stores executed by fragment shader invocations are not.

...

The above limitations on shader invocation order also make some forms of synchronization between shader invocations within a single set of primitives unimplementable. For example, having one invocation poll memory written by another invocation assumes that the other invocation has been launched and can complete its writes. The only case where such a guarantee is made is when the inputs of one shader invocation are generated from the outputs of a shader invocation in a previous stage.

是的,OpenGL 规范明确指出这是您不能的事情。我不知道它是如何进入 OpenGL 官方书籍的,但您的直觉是正确的:这是完全错误的。这实际上就是 ARB_fragment_shader_interlock 存在的原因:否则,您将无法做这样的事情。