WebGL 中索引和非索引几何体中的顶点是如何转换的?
How are vertices transformed in WebGL in indexed and non-indexed geometries?
我正在尝试消化这两个链接:
https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview
https://www.khronos.org/opengl/wiki/Vertex_Shader
流水线概述表明顶点着色器 运行s 在基本组件之前。
第二个提到这个:
A vertex shader is (usually) invariant with its input. That is, within a single Drawing Command, two vertex shader invocations that get the exact same input attributes will return binary identical results. Because of this, if OpenGL can detect that a vertex shader invocation is being given the same inputs as a previous invocation, it is allowed to reuse the results of the previous invocation, instead of wasting valuable time executing something that it already knows the answer to.
OpenGL implementations generally do not do this by actually comparing the input values (that would take far too long). Instead, this optimization typically only happens when using indexed rendering functions. If a particular index is specified more than once (within the same Instanced Rendering), then this vertex is guaranteed to result in the exact same input data.
Therefore, implementations employ a cache on the results of vertex shaders. If an index/instance pair comes up again, and the result is still in the cache, then the vertex shader is not executed again. Thus, there can be fewer vertex shader invocations than there are vertices specified.
所以如果我有两个四边形,每个四边形有两个三角形:
索引:
verts: { 0 1 2 3 }
tris: { 0 1 2 }
{ 1 2 3 }
汤:
verts: { 0 1 2 3 4 5 }
tris: { 0 1 2 }
{ 3 4 5 }
也许还有一个像这样的顶点着色器:
uniform mat4 mvm;
uniform mat4 pm;
attribute vec3 position;
void main (){
vec4 res;
for ( int i = 0; i < 256; i++ ){
res = pm * mvm * vec4(position,1.);
}
gl_Position = res;
我应该关心一个有 4 个顶点而另一个有 6 个顶点吗?从 gpu 到 gpu 是这样吗,一个调用顶点着色器 4 次还是 6 次?这如何受缓存影响:
If an index/instance pair comes up again, and the result is still in the cache...
这里的原始数字与性能有什么关系?在这两种情况下,我都有相同数量的基元。
对于一个非常简单的片段着色器,但是一个昂贵的顶点着色器的情况:
void main(){
gl_FragColor = vec4(1.);
}
和一个镶嵌四边形(100x100 段)我可以说索引版本 将 运行 更快,或者 可以 运行 更快,或者说 nothing?
就像根据规范的 GPU 中的所有内容一样,您无话可说。这取决于驱动程序和 GPU。实际上,尽管在您的示例中,4 个顶点 运行 几乎到处都比 6 个顶点快?
搜索顶点顺序优化,出现很多文章
Linear-Speed Vertex Cache Optimisation
AMD Triangle Order Optimization Tool
Triangle Order Optimization for Graphics Hardware Computation Culling
无关,但规范与现实的另一个例子是,根据规范深度测试发生在片段着色器 运行 之后(否则您无法在片段着色器中设置 gl_FragDepth
。实际上,只要结果相同,driver/GPU 就可以为所欲为,因此不设置 gl_FragDepth
或 discard
的片段着色器首先且仅对某些片段进行深度测试 运行 如果测试通过。
我正在尝试消化这两个链接:
https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview https://www.khronos.org/opengl/wiki/Vertex_Shader
流水线概述表明顶点着色器 运行s 在基本组件之前。
第二个提到这个:
A vertex shader is (usually) invariant with its input. That is, within a single Drawing Command, two vertex shader invocations that get the exact same input attributes will return binary identical results. Because of this, if OpenGL can detect that a vertex shader invocation is being given the same inputs as a previous invocation, it is allowed to reuse the results of the previous invocation, instead of wasting valuable time executing something that it already knows the answer to.
OpenGL implementations generally do not do this by actually comparing the input values (that would take far too long). Instead, this optimization typically only happens when using indexed rendering functions. If a particular index is specified more than once (within the same Instanced Rendering), then this vertex is guaranteed to result in the exact same input data.
Therefore, implementations employ a cache on the results of vertex shaders. If an index/instance pair comes up again, and the result is still in the cache, then the vertex shader is not executed again. Thus, there can be fewer vertex shader invocations than there are vertices specified.
所以如果我有两个四边形,每个四边形有两个三角形:
索引:
verts: { 0 1 2 3 }
tris: { 0 1 2 }
{ 1 2 3 }
汤:
verts: { 0 1 2 3 4 5 }
tris: { 0 1 2 }
{ 3 4 5 }
也许还有一个像这样的顶点着色器:
uniform mat4 mvm;
uniform mat4 pm;
attribute vec3 position;
void main (){
vec4 res;
for ( int i = 0; i < 256; i++ ){
res = pm * mvm * vec4(position,1.);
}
gl_Position = res;
我应该关心一个有 4 个顶点而另一个有 6 个顶点吗?从 gpu 到 gpu 是这样吗,一个调用顶点着色器 4 次还是 6 次?这如何受缓存影响:
If an index/instance pair comes up again, and the result is still in the cache...
这里的原始数字与性能有什么关系?在这两种情况下,我都有相同数量的基元。
对于一个非常简单的片段着色器,但是一个昂贵的顶点着色器的情况:
void main(){
gl_FragColor = vec4(1.);
}
和一个镶嵌四边形(100x100 段)我可以说索引版本 将 运行 更快,或者 可以 运行 更快,或者说 nothing?
就像根据规范的 GPU 中的所有内容一样,您无话可说。这取决于驱动程序和 GPU。实际上,尽管在您的示例中,4 个顶点 运行 几乎到处都比 6 个顶点快?
搜索顶点顺序优化,出现很多文章
Linear-Speed Vertex Cache Optimisation
AMD Triangle Order Optimization Tool
Triangle Order Optimization for Graphics Hardware Computation Culling
无关,但规范与现实的另一个例子是,根据规范深度测试发生在片段着色器 运行 之后(否则您无法在片段着色器中设置 gl_FragDepth
。实际上,只要结果相同,driver/GPU 就可以为所欲为,因此不设置 gl_FragDepth
或 discard
的片段着色器首先且仅对某些片段进行深度测试 运行 如果测试通过。