OpenGL OGLDev SSAO 教程实现片段着色器产生噪声
OpenGL OGLDev SSAO Tutorial Implementation Fragment Shader yields Noise
任务背景
我正在尝试实施 SSAO after OGLDev Tutorial 45, which is based on a Tutorial by John Chapman。 OGLDev 教程使用高度简化的方法,该方法在片段位置周围的半径内对随机点进行采样,并根据采样点中有多少深度大于存储在该位置的实际表面深度(位置越多片段周围位于它前面的遮挡越大)。
我使用的 'engine' 没有像 OGLDev 那样的模块化延迟着色,但基本上它首先将整个屏幕颜色渲染到带有纹理附件和深度渲染缓冲区附件的帧缓冲区。为了比较深度,片段视图 space 位置被渲染到另一个带有纹理附件的帧缓冲区。
这些纹理然后由 SSAO 着色器进行后处理,结果绘制到屏幕填充四边形。
两种纹理都可以很好地绘制到四边形,着色器输入制服似乎也不错,所以这就是我没有包含任何引擎代码的原因。
片段着色器几乎相同,如下所示。我已经包含了一些符合我个人理解的评论。
#version 330 core
in vec2 texCoord;
layout(location = 0) out vec4 outColor;
const int RANDOM_VECTOR_ARRAY_MAX_SIZE = 128; // reference uses 64
const float SAMPLE_RADIUS = 1.5f; // TODO: play with this value, reference uses 1.5
uniform sampler2D screenColorTexture; // the whole rendered screen
uniform sampler2D viewPosTexture; // interpolated vertex positions in view space
uniform mat4 projMat;
// we use a uniform buffer object for better performance
layout (std140) uniform RandomVectors
{
vec3 randomVectors[RANDOM_VECTOR_ARRAY_MAX_SIZE];
};
void main()
{
vec4 screenColor = texture(screenColorTexture, texCoord).rgba;
vec3 viewPos = texture(viewPosTexture, texCoord).xyz;
float AO = 0.0;
// sample random points to compare depths around the view space position.
// the more sampled points lie in front of the actual depth at the sampled position,
// the higher the probability of the surface point to be occluded.
for (int i = 0; i < RANDOM_VECTOR_ARRAY_MAX_SIZE; ++i) {
// take a random sample point.
vec3 samplePos = viewPos + randomVectors[i];
// project sample point onto near clipping plane
// to find the depth value (i.e. actual surface geometry)
// at the given view space position for which to compare depth
vec4 offset = vec4(samplePos, 1.0);
offset = projMat * offset; // project onto near clipping plane
offset.xy /= offset.w; // perform perspective divide
offset.xy = offset.xy * 0.5 + vec2(0.5); // transform to [0,1] range
float sampleActualSurfaceDepth = texture(viewPosTexture, offset.xy).z;
// compare depth of random sampled point to actual depth at sampled xy position:
// the function step(edge, value) returns 1 if value > edge, else 0
// thus if the random sampled point's depth is greater (lies behind) of the actual surface depth at that point,
// the probability of occlusion increases.
// note: if the actual depth at the sampled position is too far off from the depth at the fragment position,
// i.e. the surface has a sharp ridge/crevice, it doesnt add to the occlusion, to avoid artifacts.
if (abs(viewPos.z - sampleActualSurfaceDepth) < SAMPLE_RADIUS) {
AO += step(sampleActualSurfaceDepth, samplePos.z);
}
}
// normalize the ratio of sampled points lying behind the surface to a probability in [0,1]
// the occlusion factor should make the color darker, not lighter, so we invert it.
AO = 1.0 - AO / float(RANDOM_VECTOR_ARRAY_MAX_SIZE);
///
outColor = screenColor + mix(vec4(0.2), vec4(pow(AO, 2.0)), 1.0);
/*/
outColor = vec4(viewPos, 1); // DEBUG: draw view space positions
//*/
}
什么有效?
- 片段颜色纹理正确。
- 纹理坐标是我们绘制并转换为 [0, 1] 的屏幕填充四边形的坐标。它们产生与
vec2 texCoord = gl_FragCoord.xy / textureSize(screenColorTexture, 0);
相同的结果
- (透视)投影矩阵是相机使用的矩阵,它就是为此目的而工作的。无论如何,这似乎不是问题。
- 随机样本矢量分量按预期在 [-1, 1] 范围内。
- 片段视图 space 位置纹理似乎没问题:
怎么了?
当我将片段着色器底部的 AO 混合因子设置为 0 时,它会平滑地运行到 fps 上限(即使仍在执行计算,至少我猜编译器不会对其进行优化 :D ) .但是,当混合 AO 时,每帧绘制最多需要 80 毫秒(随着时间的推移越来越慢,就好像缓冲区已满),结果非常有趣且令人困惑:
显然映射看起来很远,闪烁的噪声看起来很随机,好像它直接对应于随机样本向量。
我发现最有趣的是绘制时间仅在添加 AO 因子时大量增加,而不是由于遮挡计算。绘制缓冲区是否存在问题?
问题似乎与所选的纹理类型有关。
带句柄 viewPosTexture
的纹理需要明确定义为浮点纹理格式 GL_RGB16F
或 GL_RGBA32F
,而不仅仅是 GL_RGB
。有趣的是,单独的纹理绘制得很好,问题只出现在组合上。
// generate screen color texture
// note: GL_NEAREST interpolation is ok since there is no subpixel sampling anyway
glGenTextures(1, &screenColorTexture);
glBindTexture(GL_TEXTURE_2D, screenColorTexture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, windowWidth, windowHeight, 0, GL_BGR, GL_UNSIGNED_BYTE, NULL);
// generate depth renderbuffer. without this, depth testing wont work.
// we use a renderbuffer since we wont have to sample this, opengl uses it directly.
glGenRenderbuffers(1, &screenDepthBuffer);
glBindRenderbuffer(GL_RENDERBUFFER, screenDepthBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT, windowWidth, windowHeight);
// generate vertex view space position texture
glGenTextures(1, &viewPosTexture);
glBindTexture(GL_TEXTURE_2D, viewPosTexture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, windowWidth, windowHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
绘制缓慢可能是 GLSL mix function 造成的。将对此进行进一步调查。
闪烁是由于在每帧中重新生成和传递新的随机向量。一旦传递足够的随机向量就可以解决问题。否则它可能有助于模糊 SSAO 结果。
基本上,SSAO 现在可以使用了!现在只是或多或少明显的错误。
任务背景
我正在尝试实施 SSAO after OGLDev Tutorial 45, which is based on a Tutorial by John Chapman。 OGLDev 教程使用高度简化的方法,该方法在片段位置周围的半径内对随机点进行采样,并根据采样点中有多少深度大于存储在该位置的实际表面深度(位置越多片段周围位于它前面的遮挡越大)。
我使用的 'engine' 没有像 OGLDev 那样的模块化延迟着色,但基本上它首先将整个屏幕颜色渲染到带有纹理附件和深度渲染缓冲区附件的帧缓冲区。为了比较深度,片段视图 space 位置被渲染到另一个带有纹理附件的帧缓冲区。 这些纹理然后由 SSAO 着色器进行后处理,结果绘制到屏幕填充四边形。 两种纹理都可以很好地绘制到四边形,着色器输入制服似乎也不错,所以这就是我没有包含任何引擎代码的原因。
片段着色器几乎相同,如下所示。我已经包含了一些符合我个人理解的评论。
#version 330 core
in vec2 texCoord;
layout(location = 0) out vec4 outColor;
const int RANDOM_VECTOR_ARRAY_MAX_SIZE = 128; // reference uses 64
const float SAMPLE_RADIUS = 1.5f; // TODO: play with this value, reference uses 1.5
uniform sampler2D screenColorTexture; // the whole rendered screen
uniform sampler2D viewPosTexture; // interpolated vertex positions in view space
uniform mat4 projMat;
// we use a uniform buffer object for better performance
layout (std140) uniform RandomVectors
{
vec3 randomVectors[RANDOM_VECTOR_ARRAY_MAX_SIZE];
};
void main()
{
vec4 screenColor = texture(screenColorTexture, texCoord).rgba;
vec3 viewPos = texture(viewPosTexture, texCoord).xyz;
float AO = 0.0;
// sample random points to compare depths around the view space position.
// the more sampled points lie in front of the actual depth at the sampled position,
// the higher the probability of the surface point to be occluded.
for (int i = 0; i < RANDOM_VECTOR_ARRAY_MAX_SIZE; ++i) {
// take a random sample point.
vec3 samplePos = viewPos + randomVectors[i];
// project sample point onto near clipping plane
// to find the depth value (i.e. actual surface geometry)
// at the given view space position for which to compare depth
vec4 offset = vec4(samplePos, 1.0);
offset = projMat * offset; // project onto near clipping plane
offset.xy /= offset.w; // perform perspective divide
offset.xy = offset.xy * 0.5 + vec2(0.5); // transform to [0,1] range
float sampleActualSurfaceDepth = texture(viewPosTexture, offset.xy).z;
// compare depth of random sampled point to actual depth at sampled xy position:
// the function step(edge, value) returns 1 if value > edge, else 0
// thus if the random sampled point's depth is greater (lies behind) of the actual surface depth at that point,
// the probability of occlusion increases.
// note: if the actual depth at the sampled position is too far off from the depth at the fragment position,
// i.e. the surface has a sharp ridge/crevice, it doesnt add to the occlusion, to avoid artifacts.
if (abs(viewPos.z - sampleActualSurfaceDepth) < SAMPLE_RADIUS) {
AO += step(sampleActualSurfaceDepth, samplePos.z);
}
}
// normalize the ratio of sampled points lying behind the surface to a probability in [0,1]
// the occlusion factor should make the color darker, not lighter, so we invert it.
AO = 1.0 - AO / float(RANDOM_VECTOR_ARRAY_MAX_SIZE);
///
outColor = screenColor + mix(vec4(0.2), vec4(pow(AO, 2.0)), 1.0);
/*/
outColor = vec4(viewPos, 1); // DEBUG: draw view space positions
//*/
}
什么有效?
- 片段颜色纹理正确。
- 纹理坐标是我们绘制并转换为 [0, 1] 的屏幕填充四边形的坐标。它们产生与
vec2 texCoord = gl_FragCoord.xy / textureSize(screenColorTexture, 0);
相同的结果
- (透视)投影矩阵是相机使用的矩阵,它就是为此目的而工作的。无论如何,这似乎不是问题。
- 随机样本矢量分量按预期在 [-1, 1] 范围内。
- 片段视图 space 位置纹理似乎没问题:
怎么了?
当我将片段着色器底部的 AO 混合因子设置为 0 时,它会平滑地运行到 fps 上限(即使仍在执行计算,至少我猜编译器不会对其进行优化 :D ) .但是,当混合 AO 时,每帧绘制最多需要 80 毫秒(随着时间的推移越来越慢,就好像缓冲区已满),结果非常有趣且令人困惑:
显然映射看起来很远,闪烁的噪声看起来很随机,好像它直接对应于随机样本向量。 我发现最有趣的是绘制时间仅在添加 AO 因子时大量增加,而不是由于遮挡计算。绘制缓冲区是否存在问题?
问题似乎与所选的纹理类型有关。
带句柄 viewPosTexture
的纹理需要明确定义为浮点纹理格式 GL_RGB16F
或 GL_RGBA32F
,而不仅仅是 GL_RGB
。有趣的是,单独的纹理绘制得很好,问题只出现在组合上。
// generate screen color texture
// note: GL_NEAREST interpolation is ok since there is no subpixel sampling anyway
glGenTextures(1, &screenColorTexture);
glBindTexture(GL_TEXTURE_2D, screenColorTexture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, windowWidth, windowHeight, 0, GL_BGR, GL_UNSIGNED_BYTE, NULL);
// generate depth renderbuffer. without this, depth testing wont work.
// we use a renderbuffer since we wont have to sample this, opengl uses it directly.
glGenRenderbuffers(1, &screenDepthBuffer);
glBindRenderbuffer(GL_RENDERBUFFER, screenDepthBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT, windowWidth, windowHeight);
// generate vertex view space position texture
glGenTextures(1, &viewPosTexture);
glBindTexture(GL_TEXTURE_2D, viewPosTexture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, windowWidth, windowHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
绘制缓慢可能是 GLSL mix function 造成的。将对此进行进一步调查。
闪烁是由于在每帧中重新生成和传递新的随机向量。一旦传递足够的随机向量就可以解决问题。否则它可能有助于模糊 SSAO 结果。
基本上,SSAO 现在可以使用了!现在只是或多或少明显的错误。