在 AMD 上写入非零 FBO 附件时 OpenGL 性能下降

OpenGL drops performance when writing to nonzero FBO attachment on AMD

我注意到我的 3D 引擎 运行 在 AMD 硬件上运行速度非常慢。经过一些调查,缓慢的代码归结为创建具有多个附件的 FBO 并写入任何非零附件。在所有测试中,我将 AMD 性能与相同的 AMD GPU 进行了比较,但写入不受影响 GL_COLOR_ATTACHMENT0,并且使用 Nvidia 硬件,其性能与我的 AMD 设备的差异是众所周知的。

将片段写入非零附件比预期慢 2-3 倍。

此代码等同于我在测试应用程序中创建帧缓冲区和测量性能的方式:

    // Create a framebuffer
    static const auto attachmentCount = 6;
    GLuint fb, att[attachmentCount];
    glGenTextures(attachmentCount, att);
    glGenFramebuffers(1, &fb);
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);

    for (auto i = 0; i < attachmentCount; ++i) {
        glBindTexture(GL_TEXTURE_2D, att[i]);
        glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
        glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, att[i], 0);
    }
    GLuint dbs[] = {
        GL_NONE,
        GL_COLOR_ATTACHMENT1,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE};
    glDrawBuffers(attachmentCount, dbs);


    // Main loop
    while (shouldWork) {
        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();
        showFps();
    }

有什么问题吗?

可以找到完全可重现的最小测试 here. I tried many other writing patterns or OpenGL states and described some of them in AMD Community

我想问题出在 AMD 的 OpenGL 驱动程序中,但如果不是,或者您遇到了同样的问题并找到了解决方法(供应商扩展?),请分享。

UPD:将问题详情移至此处。

我准备了一个最小的测试包,其中应用程序创建了一个带有六个 RGBA UNSIGNED_BYTE 附件的 FBO,并每帧渲染 100 个全屏矩形。有四种可执行文件,四种写法:

  1. 正在将着色器输出 0 写入附件 0。只有输出 0 使用 glDrawBuffers 路由到帧缓冲区。所有其他输出设置为 GL_NONE.

  2. 与 1 相同,但具有输出和附件 1。

  3. 正在将输出 0 写入附件 0,但是所有六个着色器输出都分别路由到附件 0..6,并且除 0 之外的所有绘制缓冲区都用 glColorMaski.[=32= 屏蔽]

  4. 与 3 相同,但用于附件 1。

我 运行 在两台具有几乎相似 CPU 和以下 GPU 的机器上进行所有测试:

AMD Radeon RX550,驱动版本19.30.01.16

Nvidia Geforce GTX 650 Ti,比 RX550

低约 2 倍

并得到了这些结果:

Geforce GTX 650 Ti:
attachment0: 195 FPS
attachment1: 195 FPS
attachment0 masked: 195 FPS
attachment1 masked: 235 FPS
Radeon RX550:
attachment0: 350 FPS
attachment1: 185 FPS
attachment0 masked: 330 FPS
attachment1 masked: 175 FPS

预构建的测试可执行文件附加到 post 或者可以从 Google drive 下载。

测试源(使用 MSVS 友好的 cmake 构建系统)可在 Github

上获得

所有四个程序都显示黑色 window 和带有 FPS 计数器的控制台。

我们看到,当写入非零附件时,AMD 比不那么强大的 nvidia GPU 和它自己慢得多。绘图缓冲区输出的全局屏蔽也会降低一些 fps。

我还尝试使用渲染缓冲区而不是纹理,使用其他图像格式(而测试中的格式是最兼容的格式),渲染为二次方大小的帧缓冲区。结果是一样的。

明确关闭剪刀、模板和深度测试没有帮助。

如果我通过将顶点坐标乘以小于 1 的值来减少附件数量或减少帧缓冲区覆盖率,测试性能会按比例提高,最终 RX550 优于 GTX 650 Ti。

glClear 调用也受到影响,它们在各种条件下的性能符合上述观察结果。

我的队友在 Radeon HD 3000 上使用 Linux 并使用 Wine 进行了测试。两个测试 运行s 都暴露了 attachment0 和 attachment1 测试之间的巨大差异。我不能确切地说出他的驱动程序版本是什么,但它是由 Ubuntu 19.04 repos.

提供的

另一位队友在 Radeon RX590 上进行了测试,得到了相同的 2 倍差异。

最后,让我在这里复制粘贴两个几乎相同的测试示例。这个效果很快:

#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>

#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>

static std::string getErrorDescr(const GLenum errCode)
{
    // English descriptions are from
    // https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
    switch (errCode) {
        case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
        case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
        case GL_INVALID_VALUE: return "A numeric argument is out of range.";
        case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
        case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
        case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
        case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
        case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
        default:;
    }
    return "No description available.";
}

static std::string getErrorMessage()
{
    const GLenum error = glGetError();
    if (GL_NO_ERROR == error) return "";

    std::stringstream ss;
    ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
    ss << "Error string: ";
    ss << getErrorDescr(error);
    ss << std::endl;
    return ss.str();
}

[[maybe_unused]] static bool error()
{
    const auto message = getErrorMessage();
    if (message.length() == 0) return false;
    std::cerr << message;
    return true;
}

static bool compileShader(const GLuint shader, const std::string& source)
{
    unsigned int linesCount = 0;
    for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
    const char** sourceLines = new const char*[linesCount];
    int* lengths = new int[linesCount];

    int idx = 0;
    const char* lineStart = source.data();
    int lineLength = 1;
    const auto len = source.length();
    for (unsigned int i = 0; i < len; ++i) {
        if (source[i] == '\n') {
            sourceLines[idx] = lineStart;
            lengths[idx] = lineLength;
            lineLength = 1;
            lineStart = source.data() + i + 1;
            ++idx;
        }
        else ++lineLength;
    }

    glShaderSource(shader, linesCount, sourceLines, lengths);
    glCompileShader(shader);
    GLint logLength;
    glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetShaderInfoLog(shader, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }

    GLint compileStatus;
    glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
    delete[] sourceLines;
    delete[] lengths;
    return bool(compileStatus);
}

static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
    const auto vs = glCreateShader(GL_VERTEX_SHADER);
    if (vs == 0) {
        std::cerr << "Error: vertex shader is 0." << std::endl;
        return 2;
    }
    const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
    if (fs == 0) {
        std::cerr << "Error: fragment shader is 0." << std::endl;
        return 2;
    }

    // Compile shaders
    if (!compileShader(vs, vertSource)) {
        std::cerr << "Error: could not compile vertex shader." << std::endl;
        return 5;
    }
    if (!compileShader(fs, fragSource)) {
        std::cerr << "Error: could not compile fragment shader." << std::endl;
        return 5;
    }

    // Link program
    const auto program = glCreateProgram();
    if (program == 0) {
        std::cerr << "Error: program is 0." << std::endl;
        return 2;
    }
    glAttachShader(program, vs);
    glAttachShader(program, fs);
    glLinkProgram(program);

    // Get log
    GLint logLength = 0;
    glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);

    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetProgramInfoLog(program, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }
    GLint linkStatus = 0;
    glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
    if (!linkStatus) {
        std::cerr << "Error: could not link." << std::endl;
        return 2;
    }
    glDeleteShader(vs);
    glDeleteShader(fs);
    return program;
}

static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
    gl_Position = vec4(v, 0.0, 1.0);
}
)";

static const std::string fragSource = R"(
#version 330
layout(location = 0) out vec4 outColor0;
void main()
{
    outColor0 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";

int main()
{
    // Init
    if (!glfwInit()) {
        std::cerr << "Error: glfw init failed." << std::endl;
        return 3;
    }

    static const int width = 800;
    static const int height= 600;
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    GLFWwindow* window = nullptr;
    window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
    if (window == nullptr) {
        std::cerr << "Error: window is null." << std::endl;
        glfwTerminate();
        return 1;
    }
    glfwMakeContextCurrent(window);

    if (glewInit() != GLEW_OK) {
        std::cerr << "Error: glew not OK." << std::endl;
        glfwTerminate();
        return 2;
    }

    // Shader program
    const auto shaderProgram = createProgram(vertSource, fragSource);
    glUseProgram(shaderProgram);

    // Vertex buffer
    GLuint vao;
    glGenVertexArrays(1, &vao);
    glBindVertexArray(vao);

    GLuint buffer;
    glGenBuffers(1, &buffer);
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    float bufferData[] = {
        -1.0f, -1.0f,
        1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, 1.0f
    };
    glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
    glEnableVertexAttribArray(0);
    glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));

    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);

    // Framebuffer
    GLuint fb, att[6];
    glGenTextures(6, att);
    glGenFramebuffers(1, &fb);

    glBindTexture(GL_TEXTURE_2D, att[0]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[1]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[2]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[3]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[4]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[5]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);

    GLuint dbs[] = {
        GL_COLOR_ATTACHMENT0,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE};
    glDrawBuffers(6, dbs);

    if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
        std::cerr << "Error: framebuffer is incomplete." << std::endl;
        return 1;
    }
    if (error()) {
        std::cerr << "OpenGL error occured." << std::endl;
        return 2;
    }

    // Fpsmeter
    static const uint32_t framesMax = 50;
    uint32_t framesCount = 0;
    auto start = std::chrono::steady_clock::now();

    // Main loop
    while (!glfwWindowShouldClose(window)) {
        if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);

        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();

        if (++framesCount == framesMax) {
            framesCount = 0;
            const auto now = std::chrono::steady_clock::now();
            const auto duration = now - start;
            start = now;
            const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
            std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
        }
    }

    // Shutdown
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glBindVertexArray(vao);
    glUseProgram(0);
    glDeleteProgram(shaderProgram);
    glDeleteBuffers(1, &buffer);
    glDeleteVertexArrays(1, &vao);
    glDeleteFramebuffers(1, &fb);
    glDeleteTextures(6, att);
    glfwMakeContextCurrent(nullptr);
    glfwDestroyWindow(window);
    glfwTerminate();
    return 0;
}

这个在 Nvidia 和 Intel GPU 上运行速度相当,但比在 AMD GPU 上的第一个例子慢 2-3 倍:

#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>

#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>

static std::string getErrorDescr(const GLenum errCode)
{
    // English descriptions are from
    // https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
    switch (errCode) {
        case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
        case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
        case GL_INVALID_VALUE: return "A numeric argument is out of range.";
        case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
        case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
        case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
        case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
        case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
        default:;
    }
    return "No description available.";
}

static std::string getErrorMessage()
{
    const GLenum error = glGetError();
    if (GL_NO_ERROR == error) return "";

    std::stringstream ss;
    ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
    ss << "Error string: ";
    ss << getErrorDescr(error);
    ss << std::endl;
    return ss.str();
}

[[maybe_unused]] static bool error()
{
    const auto message = getErrorMessage();
    if (message.length() == 0) return false;
    std::cerr << message;
    return true;
}

static bool compileShader(const GLuint shader, const std::string& source)
{
    unsigned int linesCount = 0;
    for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
    const char** sourceLines = new const char*[linesCount];
    int* lengths = new int[linesCount];

    int idx = 0;
    const char* lineStart = source.data();
    int lineLength = 1;
    const auto len = source.length();
    for (unsigned int i = 0; i < len; ++i) {
        if (source[i] == '\n') {
            sourceLines[idx] = lineStart;
            lengths[idx] = lineLength;
            lineLength = 1;
            lineStart = source.data() + i + 1;
            ++idx;
        }
        else ++lineLength;
    }

    glShaderSource(shader, linesCount, sourceLines, lengths);
    glCompileShader(shader);
    GLint logLength;
    glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetShaderInfoLog(shader, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }

    GLint compileStatus;
    glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
    delete[] sourceLines;
    delete[] lengths;
    return bool(compileStatus);
}

static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
    const auto vs = glCreateShader(GL_VERTEX_SHADER);
    if (vs == 0) {
        std::cerr << "Error: vertex shader is 0." << std::endl;
        return 2;
    }
    const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
    if (fs == 0) {
        std::cerr << "Error: fragment shader is 0." << std::endl;
        return 2;
    }

    // Compile shaders
    if (!compileShader(vs, vertSource)) {
        std::cerr << "Error: could not compile vertex shader." << std::endl;
        return 5;
    }
    if (!compileShader(fs, fragSource)) {
        std::cerr << "Error: could not compile fragment shader." << std::endl;
        return 5;
    }

    // Link program
    const auto program = glCreateProgram();
    if (program == 0) {
        std::cerr << "Error: program is 0." << std::endl;
        return 2;
    }
    glAttachShader(program, vs);
    glAttachShader(program, fs);
    glLinkProgram(program);

    // Get log
    GLint logLength = 0;
    glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);

    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetProgramInfoLog(program, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }
    GLint linkStatus = 0;
    glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
    if (!linkStatus) {
        std::cerr << "Error: could not link." << std::endl;
        return 2;
    }
    glDeleteShader(vs);
    glDeleteShader(fs);
    return program;
}

static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
    gl_Position = vec4(v, 0.0, 1.0);
}
)";

static const std::string fragSource = R"(
#version 330
layout(location = 1) out vec4 outColor1;
void main()
{
    outColor1 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";

int main()
{
    // Init
    if (!glfwInit()) {
        std::cerr << "Error: glfw init failed." << std::endl;
        return 3;
    }

    static const int width = 800;
    static const int height= 600;
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    GLFWwindow* window = nullptr;
    window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
    if (window == nullptr) {
        std::cerr << "Error: window is null." << std::endl;
        glfwTerminate();
        return 1;
    }
    glfwMakeContextCurrent(window);

    if (glewInit() != GLEW_OK) {
        std::cerr << "Error: glew not OK." << std::endl;
        glfwTerminate();
        return 2;
    }

    // Shader program
    const auto shaderProgram = createProgram(vertSource, fragSource);
    glUseProgram(shaderProgram);

    // Vertex buffer
    GLuint vao;
    glGenVertexArrays(1, &vao);
    glBindVertexArray(vao);

    GLuint buffer;
    glGenBuffers(1, &buffer);
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    float bufferData[] = {
        -1.0f, -1.0f,
        1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, 1.0f
    };
    glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
    glEnableVertexAttribArray(0);
    glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));

    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);

    // Framebuffer
    GLuint fb, att[6];
    glGenTextures(6, att);
    glGenFramebuffers(1, &fb);

    glBindTexture(GL_TEXTURE_2D, att[0]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[1]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[2]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[3]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[4]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[5]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);

    GLuint dbs[] = {
        GL_NONE,
        GL_COLOR_ATTACHMENT1,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE};
    glDrawBuffers(6, dbs);

    if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
        std::cerr << "Error: framebuffer is incomplete." << std::endl;
        return 1;
    }
    if (error()) {
        std::cerr << "OpenGL error occured." << std::endl;
        return 2;
    }

    // Fpsmeter
    static const uint32_t framesMax = 50;
    uint32_t framesCount = 0;
    auto start = std::chrono::steady_clock::now();

    // Main loop
    while (!glfwWindowShouldClose(window)) {
        if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);

        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();

        if (++framesCount == framesMax) {
            framesCount = 0;
            const auto now = std::chrono::steady_clock::now();
            const auto duration = now - start;
            start = now;
            const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
            std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
        }
    }

    // Shutdown
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glBindVertexArray(vao);
    glUseProgram(0);
    glDeleteProgram(shaderProgram);
    glDeleteBuffers(1, &buffer);
    glDeleteVertexArrays(1, &vao);
    glDeleteFramebuffers(1, &fb);
    glDeleteTextures(6, att);
    glfwMakeContextCurrent(nullptr);
    glfwDestroyWindow(window);
    glfwTerminate();
    return 0;
}

这些示例之间的唯一区别是使用的颜色附件。

我特意编写了两个几乎相似的复制粘贴程序,以避免帧缓冲区删除和重新创建可能带来的不良影响。

UPD2: 还在我的 Nvidia 和 AMD 测试示例中尝试了 OpenGL 4.6 调试上下文。没有收到性能警告。

UPD3: RX470 结果:

attachment0: 775 FPS
attachment1: 396 FPS

UPD4:我构建了attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo,构建命令行是

emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html

两个测试程序发出一个绘图调用:glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);

第一次测试:使用默认配置的 Firefox,即支持 DirectX 的 ANGLE。

Unmasked Vendor:    Google Inc.
Unmasked Renderer:  ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)

attachment0: 38 FPS
attachment1: 38 FPS

第二次测试:禁用 ANGLE 的 Firefox,(about:config -> webgl.disable-angle = true),使用原生 OpenGL:

Unmasked Vendor:    ATI Technologies Inc.
Unmasked Renderer:  Radeon RX550/550 Series

attachment0: 38 FPS
attachment1: 19 FPS

我们发现 DirectX 不受此问题影响,而 OpenGL 问题在 WebGL 中可重现。这是意料之中的结果,因为游戏玩家和开发者只抱怨 OpenGL 性能。

P.S. 可能我的问题是 this and this 性能下降的根源。

自(至少)2019 年 12 月驱动程序以来,AMD 已修复该问题。上述测试程序和我们的游戏引擎 FPS 率确认了修复。 另请参阅 this 话题。

尊敬的 AMD OpenGL 驱动程序团队,非常感谢!