OpenGL 计算着色器 - 正确使用内存屏障
OpenGL Compute Shader - correct memory barrier usage
我希望能够使用计算着色器读取和写入 SSBO 的潜在相同元素,作为流体模拟的一部分,但我在同步时遇到了问题。我有一个 运行 16 次的测试着色器,下面有三个选项,希望能显示我正在尝试做的事情。
layout (std430, binding=8) coherent buffer Debug
{
int debug[ ];
};
shared int sharedInt;
layout (local_size_x = 16, local_size_y = 1, local_size_z = 1) in;
void main()
{
/////// 1. ///////
sharedInt = debug[0];
memoryBarrierShared();
barrier();
debug[0] = sharedInt[0] + 1;
memoryBarrierShared();
barrier();
// Print debug[0]: 1
/////// 2. ///////
atomicAdd(debug[0], 1);
// Print debug[0]: 16
/////// 3. ///////
sharedInt = debug[0];
memoryBarrierShared();
barrier();
atomicExchange(debug[0], debug[0]+1);
memoryBarrierShared();
barrier();
// Print debug[0]: 1
}
*需要说明的是,我一次只运行选择其中一个选项。
我试图为所有这些获得的结果是 debug[0] 等于 16,尽管我需要在我的模拟中使用第一个或第三个选项之类的东西,因为我需要读写到同一线程中的 SSBO。
我不确定我是否理解共享变量的作用,据我所知,memoryBarrierShared() 应该使工作组中的每个线程都可以看到 sharedInt 的读写,尽管如果我让只有一个工作组派出,结果一样
感谢您的帮助。
要点是变体 1 和 3 中的那些添加不是原子操作的一部分。首先你从共享 variable/ssbo 中读取,然后你执行加法,然后你写。如果所有调用都读取相同的值,则它们都具有相同的加法结果并写入相同的值。
要使加法成为原子操作的一部分,您可以使用 atomicAdd,就像您在变体 2 中所做的那样。
这是你的代码,并附有一些解释:
/////// 1. ///////
// all invocations read debug[0] into the shared variable (presumably 0)
sharedInt = debug[0];
// syncing. but since all wrote the same value into the SSBO, they all would read the same value from it,
// since values written in one invocation are always visible in the same invocation.
memoryBarrierShared();
barrier();
// all invocations do the addition and add 1 to that shared variable (but not write to the shared variable)
// then they all write the result of the addition (1) to the SSBO
debug[0] = sharedInt[0] + 1;
// another syncing that does not help if the shader ends here.
memoryBarrierShared();
barrier();
// since they all write 1, there is no other output possible than a 1 in the SSBO.
// Print debug[0]: 1
/////// 2. ///////
// all invocations tell the "atomic memory unit" (whatever that is exactly)
// to atomicly add 1 to the SSBO.
// that unit will now, sixteen times, read the value that is in the SSBO,
// add 1, and write it back. and because it is does so atomicly,
// these additions "just work" and don't use old values or the like,
// so you have a 16 in your SSBO.
atomicAdd(debug[0], 1);
// Print debug[0]: 16
/////// 3. ///////
// as above, but this has even less effect since you don't read from sharedInt :)
sharedInt = debug[0];
memoryBarrierShared();
barrier();
// all invocations read from debug[0], reading 0.
they all add 1 to the read value, so they now have 1 in their registers.
// now they tell the "atomic memory unit" to exchange whatever there is in
// debug[0] with a 1. so you write a 1 sixteen times into debug[0] and end up with a 1.
atomicExchange(debug[0], debug[0]+1);
memoryBarrierShared();
barrier();
// Print debug[0]: 1
我希望能够使用计算着色器读取和写入 SSBO 的潜在相同元素,作为流体模拟的一部分,但我在同步时遇到了问题。我有一个 运行 16 次的测试着色器,下面有三个选项,希望能显示我正在尝试做的事情。
layout (std430, binding=8) coherent buffer Debug
{
int debug[ ];
};
shared int sharedInt;
layout (local_size_x = 16, local_size_y = 1, local_size_z = 1) in;
void main()
{
/////// 1. ///////
sharedInt = debug[0];
memoryBarrierShared();
barrier();
debug[0] = sharedInt[0] + 1;
memoryBarrierShared();
barrier();
// Print debug[0]: 1
/////// 2. ///////
atomicAdd(debug[0], 1);
// Print debug[0]: 16
/////// 3. ///////
sharedInt = debug[0];
memoryBarrierShared();
barrier();
atomicExchange(debug[0], debug[0]+1);
memoryBarrierShared();
barrier();
// Print debug[0]: 1
}
*需要说明的是,我一次只运行选择其中一个选项。
我试图为所有这些获得的结果是 debug[0] 等于 16,尽管我需要在我的模拟中使用第一个或第三个选项之类的东西,因为我需要读写到同一线程中的 SSBO。
我不确定我是否理解共享变量的作用,据我所知,memoryBarrierShared() 应该使工作组中的每个线程都可以看到 sharedInt 的读写,尽管如果我让只有一个工作组派出,结果一样
感谢您的帮助。
要点是变体 1 和 3 中的那些添加不是原子操作的一部分。首先你从共享 variable/ssbo 中读取,然后你执行加法,然后你写。如果所有调用都读取相同的值,则它们都具有相同的加法结果并写入相同的值。
要使加法成为原子操作的一部分,您可以使用 atomicAdd,就像您在变体 2 中所做的那样。
这是你的代码,并附有一些解释:
/////// 1. ///////
// all invocations read debug[0] into the shared variable (presumably 0)
sharedInt = debug[0];
// syncing. but since all wrote the same value into the SSBO, they all would read the same value from it,
// since values written in one invocation are always visible in the same invocation.
memoryBarrierShared();
barrier();
// all invocations do the addition and add 1 to that shared variable (but not write to the shared variable)
// then they all write the result of the addition (1) to the SSBO
debug[0] = sharedInt[0] + 1;
// another syncing that does not help if the shader ends here.
memoryBarrierShared();
barrier();
// since they all write 1, there is no other output possible than a 1 in the SSBO.
// Print debug[0]: 1
/////// 2. ///////
// all invocations tell the "atomic memory unit" (whatever that is exactly)
// to atomicly add 1 to the SSBO.
// that unit will now, sixteen times, read the value that is in the SSBO,
// add 1, and write it back. and because it is does so atomicly,
// these additions "just work" and don't use old values or the like,
// so you have a 16 in your SSBO.
atomicAdd(debug[0], 1);
// Print debug[0]: 16
/////// 3. ///////
// as above, but this has even less effect since you don't read from sharedInt :)
sharedInt = debug[0];
memoryBarrierShared();
barrier();
// all invocations read from debug[0], reading 0.
they all add 1 to the read value, so they now have 1 in their registers.
// now they tell the "atomic memory unit" to exchange whatever there is in
// debug[0] with a 1. so you write a 1 sixteen times into debug[0] and end up with a 1.
atomicExchange(debug[0], debug[0]+1);
memoryBarrierShared();
barrier();
// Print debug[0]: 1