DirectCompute 着色器 (HLSL) 具有奇怪的数组大小

Question

我正在为计算着色器存储 uint 数组的方式而苦苦挣扎。我有以下着色器代码（重现问题的简单示例）：

cbuffer TCstParams : register(b0)
{
    int    IntValue1;
    uint   UIntArray[10];    // <== PROBLEM IS HERE
    int    IntValue2;
}

RWTexture2D<float4>                Output         : register(u0);

[numthreads(1, 1, 1)]
void CSMain()
{
    if (IntValue1 == 0)
        Output[uint2(0, 0)] = float4(1, 1, 1, 1);
}

编译后，我检查编译器的输出以了解常量缓冲区项的偏移量和大小。 "uint UIntArray[10];" 项竟然有 148 字节的大小。这很奇怪，因为 uint 是 4 个字节。所以我希望数组大小为 40 字节。

编译器输出如下：

Microsoft (R) Direct3D Shader Compiler 6.3.9600.16384
Copyright (C) 2013 Microsoft. All rights reserved.

//
// Generated by Microsoft (R) HLSL Shader Compiler 6.3.9600.16384
//
//
// Buffer Definitions: 
//
// cbuffer TCstParams
// {
//
//   int IntValue1;                     // Offset:    0 Size:     4
//   uint UIntArray[10];                // Offset:   16 Size:   148 [unused]    // <== PROBLEM IS HERE
//   int IntValue2;                     // Offset:  164 Size:     4 [unused]
//
// }
//
//
// Resource Bindings:
//
// Name                                 Type  Format         Dim Slot Elements
// ------------------------------ ---------- ------- ----------- ---- --------
// Output                                UAV  float4          2d    0        1
// TCstParams                        cbuffer      NA          NA    0        1
//
//
//
// Input signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Input
//
// Output signature:
//
// Name                 Index   Mask Register SysValue  Format   Used
// -------------------- ----- ------ -------- -------- ------- ------
// no Output
cs_5_0
dcl_globalFlags refactoringAllowed | skipOptimization
dcl_constantbuffer cb0[1], immediateIndexed
dcl_uav_typed_texture2d (float,float,float,float) u0
dcl_temps 2
dcl_thread_group 1, 1, 1

#line 13 "E:\Development\Projects\Test Projects\DirectCompute\TestShader1.hlsl"
if_z cb0[0].x
  mov r0.xyzw, l(0,0,0,0)
  itof r1.xyzw, l(1, 1, 1, 1)
  store_uav_typed u0.xyzw, r0.xyzw, r1.xyzw
endif 
ret 
// Approximately 6 instruction slots used

我检查了各种数组大小，结果很奇怪：当元素数量改变时，每个元素的大小是不同的！

我做错了什么或者我想念什么？谢谢

Answer 1

引自Microsoft Docs：

Arrays are not packed in HLSL by default. To avoid forcing the shader to take on ALU overhead for offset computations, every element in an array is stored in a four-component vector.

所以 uint UIntArray[10]; 实际上存储为 uint4 UIntArray[10];，除了最后三个填充单位不包括在大小计算中（即使它们仍然计入偏移量计算）。

如果你想要更紧凑的包装，你可以将数组声明为uint4 UInt4Array[4];然后转换它：static uint UInt1Array[16] = (uint[16])TCstParams.UInt4Array;（没有检查代码是否正确，但它应该是类似的东西).转换本身不应造成任何开销 - 但是，访问 UInt1Array 中的元素将引入额外的指令来计算实际偏移量。

DirectCompute 着色器 (HLSL) 具有奇怪的数组大小

DirectCompute shader (HLSL) has strange array size

directx

hlsl

directcompute