DX11 Compute Shader 只写入一个索引

DX11 Compute Shader writes only to one index

我真的搞不懂这是怎么回事。

我有一个计算着色器,它接收 FFT 结果(来自实际输入)并计算每个 bin 的功率,将它们存储在不同的缓冲区 (UAV) 中。 FFT 实现是 D3DCSX 库的实现。

有问题的着色器:

struct Complex {
    float real;
    float imag;
};

RWStructuredBuffer<Complex> g_result : register(u0);
RWStructuredBuffer<float> g_powers : register(u1);

[numthreads(1, 1, 1)] void main(uint3 id : SV_DispatchThreadID) {
    const uint  bin  = id.x;
    const float real = g_result[bin + 1].real;
    const float imag = g_result[bin + 1].imag;

    const float power = real * real + imag * imag;
    const float mag = sqrt(power);
    const float db = 10.0f * log10(1.0f + power);

    g_powers[bin] = power;
}

缓冲区创建代码:

//The buffer in which the resulting powers are stored (m_result_buffer1)
buffer_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = 0;
buffer_desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_DEFAULT;

hr = m_device->CreateBuffer (
    &buffer_desc,
    nullptr,
    &m_result_buffer1
); HR_THROW();

//UAV for m_result_buffer1
view_desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
view_desc.Buffer.FirstElement = 0;
view_desc.Format = DXGI_FORMAT_R32_TYPELESS;
view_desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
view_desc.Buffer.NumElements = NumBins();

hr = m_device->CreateUnorderedAccessView (
    m_result_buffer1,
    &view_desc,
    &m_result_view
); HR_THROW();

//Buffer for reading powers to the CPU
buffer_desc.BindFlags = 0;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
buffer_desc.MiscFlags = 0;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_STAGING;

hr = m_device->CreateBuffer (
    &buffer_desc,
    nullptr,
    &m_result_buffer2
); HR_THROW();

发货代码:

CComPtr<ID3D11UnorderedAccessView> result_view;

hr = m_fft->ForwardTransform (
    m_sample_view,
    &result_view
); HR_THROW();

ID3D11UnorderedAccessView* views[] = {
    result_view,  //FFT UAV   (u0)
    m_result_view //Power UAV (u1)
};

m_context->CSSetShader(m_power_cs, nullptr, 0);
m_context->CSSetUnorderedAccessViews(0, 2, views, nullptr);
m_context->Dispatch(NumBins(), 1, 1);

最后 CPU 映射代码:

m_context->CopyResource(m_result_buffer2, m_result_buffer1);

D3D11_MAPPED_SUBRESOURCE sub = { 0 };

m_context->Map(m_result_buffer2, 0, D3D11_MAP_READ, 0, &sub);
memcpy(result, sub.pData, sizeof(float) * NumBins());
m_context->Unmap(m_result_buffer2, 0);

发生的事情是这个着色器似乎让每个线程写入输出缓冲区中的相同索引。映射缓冲区始终为第一个 bin 读取正确的值,然后为所有其他 bin 读取 0.0f。 CPU 上的等效代码运行得很好。奇怪的是我已经放置了条件并且知道 bin 并不总是 0,而且 bin 0 之外的每个 bin 的功率也不总是 0.0f。我还尝试使用 for 循环在同一个线程上写入多个 bin,同样的事情发生了。我做错了什么?

我有一种预感,缓冲区创建代码或映射代码才是问题的根源。我知道我 运行 GPU 上的线程数正确并且调度 ID 正确,但 CPU 端的结果是错误的。

问题已解决!

我用 RWStructuredBuffer 来表示 RWByteOrderBuffer。不完全确定这是如何导致这个结果的,但确实如此。因此,FFT 结果现在是 RWByteOrderBuffer。不过,这个缓冲区的奇怪之处在于 D3DCSX 实现将浮点值间隔得如此之远 - 可能是出于缓存原因,但老实说我不太确定为什么。现在这是我的计算着色器(这次计算分贝而不是功率 - 一个不相关的更改):

RWByteAddressBuffer       g_result   : register(u0);
RWStructuredBuffer<float> g_decibels : register(u1);

[numthreads(256, 1, 1)] void main(uint3 id : SV_DispatchThreadID) {
    const float real = asfloat(g_result.Load(id.x * 8 + 0));
    const float imag = asfloat(g_result.Load(id.x * 8 + 4));

    const float power = real * real + imag * imag;
    const float db = 10.0f * log10(1.0f + power);

    g_decibels[id.x] = db;
}

不过,我将分贝缓冲器的描述更改为结构化缓冲器的描述,只是为了让事情对我来说更容易:

buffer_desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
buffer_desc.ByteWidth = sizeof(float) * NumBins();
buffer_desc.CPUAccessFlags = 0;
buffer_desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
buffer_desc.StructureByteStride = sizeof(float);
buffer_desc.Usage = D3D11_USAGE_DEFAULT;

hr = m_device->CreateBuffer (
    &buffer_desc,
    nullptr,
    &m_result_buffer1
); HR_THROW();

view_desc.Buffer.FirstElement = 0;
view_desc.Buffer.Flags = 0;
view_desc.Buffer.NumElements = NumBins();
view_desc.Format = DXGI_FORMAT_UNKNOWN;
view_desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;

hr = m_device->CreateUnorderedAccessView (
    m_result_buffer1,
    &view_desc,
    &m_result_view
); HR_THROW();

这就是为什么 g_decibels 仍然是 RWStructuredBuffer

我仍然不知道当只需要访问时结果缓冲区是 read/write 是否重要 - 如果我将 g_result 更改为常规 ByteOrderBuffer 我没有输出。但至少它现在可以工作了。