Vulkan 的 VkAllocationCallbacks 使用 malloc/free() 实现

Question

我正在阅读 Vulkan Memory Allocation - Memory Host 并且似乎 VkAllocationCallbacks 可以使用简单的 malloc/realloc/free 函数来实现。

typedef struct VkAllocationCallbacks {
   void*                                   pUserData;
   PFN_vkAllocationFunction                pfnAllocation;
   PFN_vkReallocationFunction              pfnReallocation;
   PFN_vkFreeFunction                      pfnFree;
   PFN_vkInternalAllocationNotification    pfnInternalAllocation;
   PFN_vkInternalFreeNotification          pfnInternalFree;
} VkAllocationCallbacks;

但我只看到实现我自己的 vkAllocationCallback 的两个可能原因：

记录和跟踪 Vulkan 的内存使用情况 API；
实现一种堆内存管理，它是一大块内存，可以反复使用和重用。显然，这可能是一种矫枉过正，并且会遇到与托管内存相同的问题（如在 Java JVM 中）。

我是不是漏掉了什么？什么样的应用程序值得实现 vkAllocationCallbacks ？

Answer 1

来自规范："Since most memory allocations are off the critical path, this is not meant as a performance feature. Rather, this can be useful for certain embedded systems, for debugging purposes (e.g. putting a guard page after all host allocations), or for memory allocation logging."

对于嵌入式系统，您可能在一开始就占用了所有内存，因此您不希望驱动程序调用 malloc，因为槽中可能什么都没有了。保护页和内存日志记录（仅用于调试版本）可能对 cautious/curious.

有用

我在某处的幻灯片上读到（抱歉，不记得在哪里），您绝对不应该实施仅提供给 malloc/realloc/free 的分配回调，因为您通常可以假设驱动程序正在做很多事情比那更好的工作（例如将小分配合并到池中）。

我认为，如果您不确定是否应该实施分配回调，那么您不需要实施分配回调，也不必担心您是否应该实施。

我认为它们适用于那些特定的用例以及那些真正想要控制一切的人。

Answer 2

我使用纯 C 的 malloc()/realloc()/free() 实现了我自己的 VkAllocatorCallback。这是一个天真的实现，完全忽略了对齐参数。考虑到 64 位 OS 中的 malloc 始终 return 具有 16（！）字节对齐的指针，这是相当大的对齐，这在我的测试中不会成为问题。参见 Reference。

为了信息的完整性，16 字节对齐也是 8/4/2 字节对齐。

我的代码如下：

  /**
   * PFN_vkAllocationFunction implementation
   */
  void*  allocationFunction(void* pUserData, size_t  size,  size_t  alignment, VkSystemAllocationScope allocationScope){

    printf("pAllocator's allocationFunction: <%s>, size: %u, alignment: %u, allocationScope: %d",
        (USER_TYPE)pUserData, size, alignment, allocationScope);
   // the allocation itself - ignore alignment, for while
   void* ptr = malloc(size);//_aligned_malloc(size, alignment);
   memset(ptr, 0, size);
   printf(", return ptr* : 0x%p \n", ptr);
   return ptr;  
}

/**
 * The PFN_vkFreeFunction implementation
 */
void freeFunction(void*   pUserData, void*   pMemory){
    printf("pAllocator's freeFunction: <%s> ptr: 0x%p\n",
    (USER_TYPE)pUserData, pMemory);
    // now, the free operation !    
    free(pMemory);
 }

/**
 * The PFN_vkReallocationFunction implementation
 */
void* reallocationFunction(void*   pUserData,   void*   pOriginal,  size_t  size, size_t  alignment,  VkSystemAllocationScope allocationScope){
    printf("pAllocator's REallocationFunction: <%s>, size %u, alignment %u, allocationScope %d \n",
    (USER_TYPE)pUserData, size, alignment, allocationScope);       
    return realloc(pOriginal, size);
 }

/**
 * PFN_vkInternalAllocationNotification implementation
 */
void internalAllocationNotification(void*   pUserData,  size_t  size,   VkInternalAllocationType allocationType, VkSystemAllocationScope                     allocationScope){
  printf("pAllocator's internalAllocationNotification: <%s>, size %uz, alignment %uz, allocationType %uz, allocationScope %s \n",
    (USER_TYPE)pUserData, 
    size, 
    allocationType, 
    allocationScope);

}

/**
 * PFN_vkInternalFreeNotification implementation
 **/
void internalFreeNotification(void*   pUserData, size_t  size,  VkInternalAllocationType  allocationType, VkSystemAllocationScope                     allocationScope){
    printf("pAllocator's internalFreeNotification: <%s>, size %uz, alignment %uz, allocationType %d, allocationScope %s \n",
            (USER_TYPE)pUserData, size, allocationType, allocationScope);
}



 /**
  * Create Pallocator
  * @param info - String for tracking Allocator usage
  */
static VkAllocationCallbacks* createPAllocator(const char* info){
    VkAllocationCallbacks* m_allocator =     (VkAllocationCallbacks*)malloc(sizeof(VkAllocationCallbacks));
    memset(m_allocator, 0, sizeof(VkAllocationCallbacks));
    m_allocator->pUserData = (void*)info;
    m_allocator->pfnAllocation = (PFN_vkAllocationFunction)(&allocationFunction);
    m_allocator->pfnReallocation = (PFN_vkReallocationFunction)(&reallocationFunction);
    m_allocator->pfnFree = (PFN_vkFreeFunction)&freeFunction;
    m_allocator->pfnInternalAllocation = (PFN_vkInternalAllocationNotification)&internalAllocationNotification;
    m_allocator->pfnInternalFree = (PFN_vkInternalFreeNotification)&internalFreeNotification;
   // storePAllocator(m_allocator);
   return m_allocator;
  }

`

我使用了来自 VulkanSDK 的 Cube.c 示例来测试我的代码和假设。此处提供修改版本 GitHub

输出样本：

pAllocator's allocationFunction: <Device>, size: 800, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061ECE40 
pAllocator's allocationFunction: <RenderPass>, size: 128, alignment: 8, allocationScope: 1, return ptr* : 0x000000000623FAB0 
pAllocator's allocationFunction: <ShaderModule>, size: 96, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F2C30 
pAllocator's allocationFunction: <ShaderModule>, size: 96, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F8790 
pAllocator's allocationFunction: <PipelineCache>, size: 152, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F2590 
pAllocator's allocationFunction: <Device>, size: 424, alignment: 8, allocationScope: 1, return ptr* : 0x00000000061F8EB0 
pAllocator's freeFunction: <ShaderModule> ptr: 0x00000000061F8790
pAllocator's freeFunction: <ShaderModule> ptr: 0x00000000061F2C30
pAllocator's allocationFunction: <Device>, size: 3448, alignment: 8, allocationScope: 1, return ptr* : 0x000000000624D260 
pAllocator's allocationFunction: <Device>, size: 3448, alignment: 8, allocationScope: 1, return ptr* : 0x0000000006249A80

结论：

用户执行了 PFN_vkAllocationFunction、PFN_vkReallocationFunction、PFN_vkFreeFunction 确实代表 malloc/realoc/free 操作伏尔甘。不确定他们是否执行所有分配，因为 Vulkan 可能自己选择 alloc/free 一些部分。
我的实现提供的输出显示，在我的 Win 7-64/NVidia 中，典型的对齐请求是 8 个字节。这表明存在优化空间，例如托管内存，您可以在其中获取大块内存并为您的 Vulkan 应用程序（内存池）进行子分配。它可能* 减少内存使用（想想每个分配块之前的 8 个字节和最多 8 个字节之后的字节）。它还可能更快，因为 malloc() 调用比直接指向您自己的已分配内存池的调用持续时间更长。
至少对于我当前的 Vulkan 驱动程序，PFN_vkInternalAllocationNotification 和 PFN_vkInternalFreeNotification 不会运行。也许是我的 NVidia 驱动程序中的错误。稍后我会检查我的AMD。
*pUserData 用于调试信息and/or管理。实际上，你可以用它来传递一个 C++ 对象，并在那里完成所有需要的性能工作。这是一种显而易见的信息，但您可以为每次调用或 VkCreateXXX 对象更改它。
您可以为所有应用程序使用一个通用的 VkAllocatorCallBack 分配器，但我想使用自定义分配器可能会产生更好的结果。在我的测试中，VkSemaphore 的创建显示了小块（72 字节）密集 alloc/free 的典型模式，这可以通过在自定义分配器中重用内存中先前的块来解决。 malloc()/free() 已经在可能的情况下重用内存，但是尝试使用我们自己的内存管理器很诱人，至少对于短暂的小内存块。
内存对齐可能是实现 VkAllocationCallback 的一个问题（没有可用的 _aligned_realoc 函数，只有 _aligned_malloc 和 _aligned_free）。但前提是 Vulkan 请求对齐 比 malloc 的默认值大（x86 为 8 个字节，AMD64 为 16 个字节，等等必须检查 ARM 默认值）。 但到目前为止，Vulkan 实际请求内存的对齐方式低于 malloc() 默认值，至少在 64 位 OS 上是这样。

Final Thought:

You can live happy until the end of time just setting all VkAllocatorCallback* pAllocator you find as NULL ;) Possibly Vulkan's default allocator already does it all better than yourself.

BUT...

One of highlights of Vulkan benefits was the developer would be put in control of everything, including memory-management. Khronos presentation, slide 6

Answer 3

这个答案是为了澄清和更正其他答案中的一些信息...

无论您做什么，都不要将 malloc/free/realloc 用于 Vulkan 分配器。 Vulkan 可以而且很可能确实使用对齐的内存副本来移动内存。使用未对齐的分配会导致内存损坏，并且会发生不好的事情。腐败也可能不会以明显的方式表现出来。而是使用 posix aligned_alloc/aligned_free/aligned_realloc。它们可以在大多数系统的 'malloc.h' 中找到。（在 windows 下使用 _aligned_alloc，等）函数 aligned_realloc 不太为人所知，但它在那里（并且已经存在多年）。顺便说一句，我的测试卡的分配到处都有对齐请求。

关于将特定于应用程序的分配器传递给 Vulkan 的一件不明显的事情是，至少有一些 Vulkan 对象 "remember" 分配器。例如，我将一个分配器传递给 vkcreateinstance 函数，并且在分配其他对象时看到来自我的分配器的消息感到非常惊讶（我也为分配器传递了一个 nullptr）。当我停下来思考时，这是有道理的，因为与 vulkan 实例交互的对象可能会导致实例进行额外的分配。

这一切都与 Vulkan 的性能有关，因为可以针对特定的分配任务编写和调整各个分配器。这可能会影响进程启动时间。但更重要的是，一个 "block" 分配器将实例分配放置在一起，例如，彼此靠近可能会对整体性能产生影响，因为它们可以提高缓存一致性。（而不是让分配分散在整个内存中）我意识到这种性能 "enhancement" 是非常推测性的，但仔细调整的应用程序可能会产生影响。（更不用说 Vulkan 中值得更多关注的众多其他性能关键路径。）

无论您做什么，都不要尝试将函数的 aligned_alloc class 用作 "release" 分配器，因为与 Vulkan 的内置分配器（在我的测试卡）。即使在简单的程序中，与 Vulkan 的分配器相比也存在非常明显的性能差异。（抱歉，我没有收集任何时间信息，但我绝不会反复坐着度过那些漫长的启动时间。）

在调试方面，即使像普通的旧 printf 这样简单的东西也可以在分配器中发挥启发作用。添加简单统计的收集也很容易。但预计会有严重的性能损失。它们也可以用作调试挂钩，而无需编写花哨的调试分配器或添加另一个调试层。

顺便说一句...我的测试卡是使用发布驱动程序的 nvidia

Vulkan 的 VkAllocationCallbacks 使用 malloc/free() 实现

Vulkan's VkAllocationCallbacks implemented with malloc/free()

memory

memory-management

vulkan