OpenGL 计算着色器 - 正确使用内存屏障

OpenGL Compute Shader - correct memory barrier usage

我希望能够使用计算着色器读取和写入 SSBO 的潜在相同元素,作为流体模拟的一部分,但我在同步时遇到了问题。我有一个 运行 16 次的测试着色器,下面有三个选项,希望能显示我正在尝试做的事情。

layout  (std430, binding=8) coherent buffer Debug
{
  int debug[ ];
};

shared int sharedInt;

layout (local_size_x = 16, local_size_y = 1, local_size_z = 1) in;

void main()
{
    ///////     1.     ///////
    sharedInt = debug[0];
    memoryBarrierShared();
    barrier();
    debug[0] = sharedInt[0] + 1;
    memoryBarrierShared();
    barrier();

    // Print debug[0]: 1


    ///////     2.     ///////
    atomicAdd(debug[0], 1);

    // Print debug[0]: 16


    ///////     3.     ///////
    sharedInt = debug[0];
    memoryBarrierShared();
    barrier();
    atomicExchange(debug[0], debug[0]+1);
    memoryBarrierShared();
    barrier();

    // Print debug[0]: 1
}

*需要说明的是,我一次只运行选择其中一个选项。

我试图为所有这些获得的结果是 debug[0] 等于 16,尽管我需要在我的模拟中使用第一个或第三个选项之类的东西,因为我需要读写到同一线程中的 SSBO。

我不确定我是否理解共享变量的作用,据我所知,memoryBarrierShared() 应该使工作组中的每个线程都可以看到 sharedInt 的读写,尽管如果我让只有一个工作组派出,结果一样

感谢您的帮助。

要点是变体 1 和 3 中的那些添加不是原子操作的一部分。首先你从共享 variable/ssbo 中读取,然后你执行加法,然后你写。如果所有调用都读取相同的值,则它们都具有相同的加法结果并写入相同的值。

要使加法成为原子操作的一部分,您可以使用 atomicAdd,就像您在变体 2 中所做的那样。

这是你的代码,并附有一些解释:

///////     1.     ///////

// all invocations read debug[0] into the shared variable (presumably 0)
sharedInt = debug[0];

// syncing. but since all wrote the same value into the SSBO, they all would read the same value from it,
// since values written in one invocation are always visible in the same invocation.
memoryBarrierShared();
barrier();

// all invocations do the addition and add 1 to that shared variable (but not write to the shared variable)
// then they all write the result of the addition (1) to the SSBO
debug[0] = sharedInt[0] + 1;

// another syncing that does not help if the shader ends here.
memoryBarrierShared();
barrier();

// since they all write 1, there is no other output possible than a 1 in the SSBO.
// Print debug[0]: 1


///////     2.     ///////
// all invocations tell the "atomic memory unit" (whatever that is exactly)
// to atomicly add 1 to the SSBO.
// that unit will now, sixteen times, read the value that is in the SSBO,
// add 1, and write it back. and because it is does so atomicly,
// these additions "just work" and don't use old values or the like,
// so you have a 16 in your SSBO.
atomicAdd(debug[0], 1);

// Print debug[0]: 16


///////     3.     ///////

// as above, but this has even less effect since you don't read from sharedInt :)
sharedInt = debug[0];
memoryBarrierShared();
barrier();

// all invocations read from debug[0], reading 0.
they all add 1 to the read value, so they now have 1 in their registers.
// now they tell the "atomic memory unit" to exchange whatever there is in
// debug[0] with a 1. so you write a 1 sixteen times into debug[0] and end up with a 1.
atomicExchange(debug[0], debug[0]+1);
memoryBarrierShared();
barrier();

// Print debug[0]: 1