在 AMD 上写入非零 FBO 附件时 OpenGL 性能下降
OpenGL drops performance when writing to nonzero FBO attachment on AMD
我注意到我的 3D 引擎 运行 在 AMD 硬件上运行速度非常慢。经过一些调查,缓慢的代码归结为创建具有多个附件的 FBO 并写入任何非零附件。在所有测试中,我将 AMD 性能与相同的 AMD GPU 进行了比较,但写入不受影响 GL_COLOR_ATTACHMENT0
,并且使用 Nvidia 硬件,其性能与我的 AMD 设备的差异是众所周知的。
将片段写入非零附件比预期慢 2-3 倍。
此代码等同于我在测试应用程序中创建帧缓冲区和测量性能的方式:
// Create a framebuffer
static const auto attachmentCount = 6;
GLuint fb, att[attachmentCount];
glGenTextures(attachmentCount, att);
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
for (auto i = 0; i < attachmentCount; ++i) {
glBindTexture(GL_TEXTURE_2D, att[i]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, att[i], 0);
}
GLuint dbs[] = {
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(attachmentCount, dbs);
// Main loop
while (shouldWork) {
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
showFps();
}
有什么问题吗?
可以找到完全可重现的最小测试 here. I tried many other writing patterns or OpenGL states and described some of them in AMD Community。
我想问题出在 AMD 的 OpenGL 驱动程序中,但如果不是,或者您遇到了同样的问题并找到了解决方法(供应商扩展?),请分享。
UPD:将问题详情移至此处。
我准备了一个最小的测试包,其中应用程序创建了一个带有六个 RGBA UNSIGNED_BYTE 附件的 FBO,并每帧渲染 100 个全屏矩形。有四种可执行文件,四种写法:
正在将着色器输出 0 写入附件 0。只有输出 0 使用 glDrawBuffers
路由到帧缓冲区。所有其他输出设置为 GL_NONE
.
与 1 相同,但具有输出和附件 1。
正在将输出 0 写入附件 0,但是所有六个着色器输出都分别路由到附件 0..6,并且除 0 之外的所有绘制缓冲区都用 glColorMaski
.[=32= 屏蔽]
与 3 相同,但用于附件 1。
我 运行 在两台具有几乎相似 CPU 和以下 GPU 的机器上进行所有测试:
AMD Radeon RX550,驱动版本19.30.01.16
Nvidia Geforce GTX 650 Ti,比 RX550
低约 2 倍
并得到了这些结果:
Geforce GTX 650 Ti:
attachment0: 195 FPS
attachment1: 195 FPS
attachment0 masked: 195 FPS
attachment1 masked: 235 FPS
Radeon RX550:
attachment0: 350 FPS
attachment1: 185 FPS
attachment0 masked: 330 FPS
attachment1 masked: 175 FPS
预构建的测试可执行文件附加到 post 或者可以从 Google drive 下载。
测试源(使用 MSVS 友好的 cmake 构建系统)可在 Github
上获得
所有四个程序都显示黑色 window 和带有 FPS 计数器的控制台。
我们看到,当写入非零附件时,AMD 比不那么强大的 nvidia GPU 和它自己慢得多。绘图缓冲区输出的全局屏蔽也会降低一些 fps。
我还尝试使用渲染缓冲区而不是纹理,使用其他图像格式(而测试中的格式是最兼容的格式),渲染为二次方大小的帧缓冲区。结果是一样的。
明确关闭剪刀、模板和深度测试没有帮助。
如果我通过将顶点坐标乘以小于 1 的值来减少附件数量或减少帧缓冲区覆盖率,测试性能会按比例提高,最终 RX550 优于 GTX 650 Ti。
glClear
调用也受到影响,它们在各种条件下的性能符合上述观察结果。
我的队友在 Radeon HD 3000 上使用 Linux 并使用 Wine 进行了测试。两个测试 运行s 都暴露了 attachment0 和 attachment1 测试之间的巨大差异。我不能确切地说出他的驱动程序版本是什么,但它是由 Ubuntu 19.04 repos.
提供的
另一位队友在 Radeon RX590 上进行了测试,得到了相同的 2 倍差异。
最后,让我在这里复制粘贴两个几乎相同的测试示例。这个效果很快:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
{
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode) {
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
}
return "No description available.";
}
static std::string getErrorMessage()
{
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
}
[[maybe_unused]] static bool error()
{
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
}
static bool compileShader(const GLuint shader, const std::string& source)
{
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i) {
if (source[i] == '\n') {
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
}
else ++lineLength;
}
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
}
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0) {
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
}
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0) {
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
}
// Compile shaders
if (!compileShader(vs, vertSource)) {
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
}
if (!compileShader(fs, fragSource)) {
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
}
// Link program
const auto program = glCreateProgram();
if (program == 0) {
std::cerr << "Error: program is 0." << std::endl;
return 2;
}
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus) {
std::cerr << "Error: could not link." << std::endl;
return 2;
}
glDeleteShader(vs);
glDeleteShader(fs);
return program;
}
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
gl_Position = vec4(v, 0.0, 1.0);
}
)";
static const std::string fragSource = R"(
#version 330
layout(location = 0) out vec4 outColor0;
void main()
{
outColor0 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";
int main()
{
// Init
if (!glfwInit()) {
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
}
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr) {
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
}
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK) {
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
}
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] = {
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
};
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] = {
GL_COLOR_ATTACHMENT0,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
}
if (error()) {
std::cerr << "OpenGL error occured." << std::endl;
return 2;
}
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window)) {
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax) {
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
}
}
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
}
这个在 Nvidia 和 Intel GPU 上运行速度相当,但比在 AMD GPU 上的第一个例子慢 2-3 倍:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
{
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode) {
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
}
return "No description available.";
}
static std::string getErrorMessage()
{
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
}
[[maybe_unused]] static bool error()
{
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
}
static bool compileShader(const GLuint shader, const std::string& source)
{
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i) {
if (source[i] == '\n') {
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
}
else ++lineLength;
}
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
}
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0) {
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
}
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0) {
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
}
// Compile shaders
if (!compileShader(vs, vertSource)) {
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
}
if (!compileShader(fs, fragSource)) {
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
}
// Link program
const auto program = glCreateProgram();
if (program == 0) {
std::cerr << "Error: program is 0." << std::endl;
return 2;
}
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus) {
std::cerr << "Error: could not link." << std::endl;
return 2;
}
glDeleteShader(vs);
glDeleteShader(fs);
return program;
}
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
gl_Position = vec4(v, 0.0, 1.0);
}
)";
static const std::string fragSource = R"(
#version 330
layout(location = 1) out vec4 outColor1;
void main()
{
outColor1 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";
int main()
{
// Init
if (!glfwInit()) {
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
}
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr) {
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
}
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK) {
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
}
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] = {
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
};
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] = {
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
}
if (error()) {
std::cerr << "OpenGL error occured." << std::endl;
return 2;
}
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window)) {
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax) {
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
}
}
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
}
这些示例之间的唯一区别是使用的颜色附件。
我特意编写了两个几乎相似的复制粘贴程序,以避免帧缓冲区删除和重新创建可能带来的不良影响。
UPD2: 还在我的 Nvidia 和 AMD 测试示例中尝试了 OpenGL 4.6 调试上下文。没有收到性能警告。
UPD3: RX470 结果:
attachment0: 775 FPS
attachment1: 396 FPS
UPD4:我构建了attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo,构建命令行是
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html
两个测试程序发出一个绘图调用:glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);
第一次测试:使用默认配置的 Firefox,即支持 DirectX 的 ANGLE。
Unmasked Vendor: Google Inc.
Unmasked Renderer: ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS
第二次测试:禁用 ANGLE 的 Firefox,(about:config
-> webgl.disable-angle = true
),使用原生 OpenGL:
Unmasked Vendor: ATI Technologies Inc.
Unmasked Renderer: Radeon RX550/550 Series
attachment0: 38 FPS
attachment1: 19 FPS
我们发现 DirectX 不受此问题影响,而 OpenGL 问题在 WebGL 中可重现。这是意料之中的结果,因为游戏玩家和开发者只抱怨 OpenGL 性能。
自(至少)2019 年 12 月驱动程序以来,AMD 已修复该问题。上述测试程序和我们的游戏引擎 FPS 率确认了修复。
另请参阅 this 话题。
尊敬的 AMD OpenGL 驱动程序团队,非常感谢!
我注意到我的 3D 引擎 运行 在 AMD 硬件上运行速度非常慢。经过一些调查,缓慢的代码归结为创建具有多个附件的 FBO 并写入任何非零附件。在所有测试中,我将 AMD 性能与相同的 AMD GPU 进行了比较,但写入不受影响 GL_COLOR_ATTACHMENT0
,并且使用 Nvidia 硬件,其性能与我的 AMD 设备的差异是众所周知的。
将片段写入非零附件比预期慢 2-3 倍。
此代码等同于我在测试应用程序中创建帧缓冲区和测量性能的方式:
// Create a framebuffer
static const auto attachmentCount = 6;
GLuint fb, att[attachmentCount];
glGenTextures(attachmentCount, att);
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
for (auto i = 0; i < attachmentCount; ++i) {
glBindTexture(GL_TEXTURE_2D, att[i]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, att[i], 0);
}
GLuint dbs[] = {
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(attachmentCount, dbs);
// Main loop
while (shouldWork) {
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
showFps();
}
有什么问题吗?
可以找到完全可重现的最小测试 here. I tried many other writing patterns or OpenGL states and described some of them in AMD Community。
我想问题出在 AMD 的 OpenGL 驱动程序中,但如果不是,或者您遇到了同样的问题并找到了解决方法(供应商扩展?),请分享。
UPD:将问题详情移至此处。
我准备了一个最小的测试包,其中应用程序创建了一个带有六个 RGBA UNSIGNED_BYTE 附件的 FBO,并每帧渲染 100 个全屏矩形。有四种可执行文件,四种写法:
正在将着色器输出 0 写入附件 0。只有输出 0 使用
glDrawBuffers
路由到帧缓冲区。所有其他输出设置为GL_NONE
.与 1 相同,但具有输出和附件 1。
正在将输出 0 写入附件 0,但是所有六个着色器输出都分别路由到附件 0..6,并且除 0 之外的所有绘制缓冲区都用
glColorMaski
.[=32= 屏蔽]与 3 相同,但用于附件 1。
我 运行 在两台具有几乎相似 CPU 和以下 GPU 的机器上进行所有测试:
AMD Radeon RX550,驱动版本19.30.01.16
Nvidia Geforce GTX 650 Ti,比 RX550
低约 2 倍并得到了这些结果:
Geforce GTX 650 Ti:
attachment0: 195 FPS
attachment1: 195 FPS
attachment0 masked: 195 FPS
attachment1 masked: 235 FPS
Radeon RX550:
attachment0: 350 FPS
attachment1: 185 FPS
attachment0 masked: 330 FPS
attachment1 masked: 175 FPS
预构建的测试可执行文件附加到 post 或者可以从 Google drive 下载。
测试源(使用 MSVS 友好的 cmake 构建系统)可在 Github
上获得所有四个程序都显示黑色 window 和带有 FPS 计数器的控制台。
我们看到,当写入非零附件时,AMD 比不那么强大的 nvidia GPU 和它自己慢得多。绘图缓冲区输出的全局屏蔽也会降低一些 fps。
我还尝试使用渲染缓冲区而不是纹理,使用其他图像格式(而测试中的格式是最兼容的格式),渲染为二次方大小的帧缓冲区。结果是一样的。
明确关闭剪刀、模板和深度测试没有帮助。
如果我通过将顶点坐标乘以小于 1 的值来减少附件数量或减少帧缓冲区覆盖率,测试性能会按比例提高,最终 RX550 优于 GTX 650 Ti。
glClear
调用也受到影响,它们在各种条件下的性能符合上述观察结果。
我的队友在 Radeon HD 3000 上使用 Linux 并使用 Wine 进行了测试。两个测试 运行s 都暴露了 attachment0 和 attachment1 测试之间的巨大差异。我不能确切地说出他的驱动程序版本是什么,但它是由 Ubuntu 19.04 repos.
提供的另一位队友在 Radeon RX590 上进行了测试,得到了相同的 2 倍差异。
最后,让我在这里复制粘贴两个几乎相同的测试示例。这个效果很快:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
{
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode) {
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
}
return "No description available.";
}
static std::string getErrorMessage()
{
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
}
[[maybe_unused]] static bool error()
{
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
}
static bool compileShader(const GLuint shader, const std::string& source)
{
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i) {
if (source[i] == '\n') {
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
}
else ++lineLength;
}
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
}
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0) {
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
}
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0) {
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
}
// Compile shaders
if (!compileShader(vs, vertSource)) {
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
}
if (!compileShader(fs, fragSource)) {
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
}
// Link program
const auto program = glCreateProgram();
if (program == 0) {
std::cerr << "Error: program is 0." << std::endl;
return 2;
}
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus) {
std::cerr << "Error: could not link." << std::endl;
return 2;
}
glDeleteShader(vs);
glDeleteShader(fs);
return program;
}
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
gl_Position = vec4(v, 0.0, 1.0);
}
)";
static const std::string fragSource = R"(
#version 330
layout(location = 0) out vec4 outColor0;
void main()
{
outColor0 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";
int main()
{
// Init
if (!glfwInit()) {
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
}
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr) {
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
}
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK) {
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
}
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] = {
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
};
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] = {
GL_COLOR_ATTACHMENT0,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
}
if (error()) {
std::cerr << "OpenGL error occured." << std::endl;
return 2;
}
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window)) {
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax) {
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
}
}
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
}
这个在 Nvidia 和 Intel GPU 上运行速度相当,但比在 AMD GPU 上的第一个例子慢 2-3 倍:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
{
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode) {
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
}
return "No description available.";
}
static std::string getErrorMessage()
{
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
}
[[maybe_unused]] static bool error()
{
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
}
static bool compileShader(const GLuint shader, const std::string& source)
{
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i) {
if (source[i] == '\n') {
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
}
else ++lineLength;
}
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
}
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0) {
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
}
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0) {
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
}
// Compile shaders
if (!compileShader(vs, vertSource)) {
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
}
if (!compileShader(fs, fragSource)) {
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
}
// Link program
const auto program = glCreateProgram();
if (program == 0) {
std::cerr << "Error: program is 0." << std::endl;
return 2;
}
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0) {
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
}
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus) {
std::cerr << "Error: could not link." << std::endl;
return 2;
}
glDeleteShader(vs);
glDeleteShader(fs);
return program;
}
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
gl_Position = vec4(v, 0.0, 1.0);
}
)";
static const std::string fragSource = R"(
#version 330
layout(location = 1) out vec4 outColor1;
void main()
{
outColor1 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";
int main()
{
// Init
if (!glfwInit()) {
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
}
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr) {
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
}
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK) {
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
}
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] = {
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
};
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] = {
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE};
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
}
if (error()) {
std::cerr << "OpenGL error occured." << std::endl;
return 2;
}
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window)) {
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax) {
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
}
}
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
}
这些示例之间的唯一区别是使用的颜色附件。
我特意编写了两个几乎相似的复制粘贴程序,以避免帧缓冲区删除和重新创建可能带来的不良影响。
UPD2: 还在我的 Nvidia 和 AMD 测试示例中尝试了 OpenGL 4.6 调试上下文。没有收到性能警告。
UPD3: RX470 结果:
attachment0: 775 FPS
attachment1: 396 FPS
UPD4:我构建了attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo,构建命令行是
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html
两个测试程序发出一个绘图调用:glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);
第一次测试:使用默认配置的 Firefox,即支持 DirectX 的 ANGLE。
Unmasked Vendor: Google Inc.
Unmasked Renderer: ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS
第二次测试:禁用 ANGLE 的 Firefox,(about:config
-> webgl.disable-angle = true
),使用原生 OpenGL:
Unmasked Vendor: ATI Technologies Inc.
Unmasked Renderer: Radeon RX550/550 Series
attachment0: 38 FPS
attachment1: 19 FPS
我们发现 DirectX 不受此问题影响,而 OpenGL 问题在 WebGL 中可重现。这是意料之中的结果,因为游戏玩家和开发者只抱怨 OpenGL 性能。
自(至少)2019 年 12 月驱动程序以来,AMD 已修复该问题。上述测试程序和我们的游戏引擎 FPS 率确认了修复。 另请参阅 this 话题。
尊敬的 AMD OpenGL 驱动程序团队,非常感谢!