为什么要使用 _mm_malloc？（相对于 _aligned_malloc、alligned_alloc 或 posix_memalign）

Question

获取对齐的内存块有几个选项，但它们非常相似，问题主要归结为您的目标语言标准和平台。

C11

void * aligned_alloc (size_t alignment, size_t size)

POSIX

int posix_memalign (void **memptr, size_t alignment, size_t size)

Windows

void * _aligned_malloc(size_t size, size_t alignment);

当然，手动对齐也是一种选择。

英特尔提供另一种选择。

英特尔

void* _mm_malloc (int size, int align)
void _mm_free (void *p)

根据英特尔发布的源代码，这似乎是他们的工程师更喜欢的分配对齐内存的方法，但我找不到任何文档将其与其他方法进行比较。我找到的最接近的只是承认存在其他对齐的内存分配例程。

https://software.intel.com/en-us/articles/memory-management-for-optimal-performance-on-intel-xeon-phi-coprocessor-alignment-and

To dynamically allocate a piece of aligned memory, use posix_memalign, which is supported by GCC as well as the Intel Compiler. The benefit of using it is that you don’t have to change the memory disposal API. You can use free() as you always do. But pay attention to the parameter profile:

int posix_memalign (void **memptr, size_t align, size_t size);

The Intel Compiler also provides another set of memory allocation APIs. C/C++ programmers can use _mm_malloc and _mm_free to allocate and free aligned blocks of memory. For example, the following statement requests a 64-byte aligned memory block for 8 floating point elements.

farray = (float *)__mm_malloc(8*sizeof(float), 64);

Memory that is allocated using _mm_malloc must be freed using _mm_free. Calling free on memory allocated with _mm_malloc or calling _mm_free on memory allocated with malloc will result in unpredictable behavior.

从用户角度来看，明显的区别在于 _mm_malloc 需要直接 CPU 和编译器支持，并且使用 _mm_malloc 分配的内存必须使用 _mm_free 释放。鉴于这些缺点，是什么原因一直使用 _mm_malloc? 它能有一点性能优势吗？历史事故？

Answer 1

_mm_malloc 似乎是在有标准 aligned_alloc 函数之前创建的，需要使用 _mm_free 是实现的一个怪癖。

我的猜测是，与使用 posix_memalign 时不同，它不需要过度分配来保证对齐，而是使用单独的对齐感知分配器。这将在分配对齐方式不同于默认对齐方式（通常为 8 或 16 字节）的类型时节省内存。

Answer 2

可以采用目前尚未碰巧使用标识符 _mm_alloc 和 _mm_free 的现有 C 编译器，并使用将按要求运行的名称定义函数。这可以通过让 _mm_alloc 函数作为 malloc() 上的包装器来完成，它要求稍微超大的分配并构造一个指向其中第一个适当对齐的地址的指针，该地址至少是一个字节开始，并存储在该地址之前跳过的字节数，或者让 _mm_malloc 从 malloc() 请求大块内存，然后零碎地分配它们。无论如何，_mm_malloc() 返回的指针不会是 free() 通常知道如何处理的指针；调用 _mm_free 将使用紧接在分配之前的字节来帮助找到从 malloc 接收到的分配的真正开始，然后将其传递给 free.

但是，如果允许对齐分配函数使用 malloc 和 free 函数的内部结构，则可能不需要额外的包装层。可以编写包含 malloc/free 的 _mm_alloc()/_mm_free() 函数，而无需了解其内部结构，但它需要 _mm_alloc() 保留簿记信息这与 malloc/free.

使用的不同

如果对齐分配函数的作者知道 malloc 和 free 是如何实现的，通常可以协调所有 allocation/free 函数的设计，以便free 可以区分各种分配并妥善处理。但是，没有一个单一的对齐分配实现可用于所有 malloc/free 实现。

我建议编写代码的最便携方式可能是 select 一些符号，这些符号不会在其他任何地方用于您自己的分配和释放函数，这样您就可以说，例如

#define a_alloc(align,sz) _mm_alloc((align),(sz))
#define a_free(ptr)  _mm_free((ptr))

在支持它的编译器上，或者

static inline void *aa_alloc(int align, int size)
{
  void *ret=0;
  posix_memalign(&ret, align, size); // Guessing here
  return ret;
}
#define a_alloc(align,sz) aa_alloc((align),(sz))
#define a_free(ptr)  free((ptr))

在 Posix 系统等上。对于每个系统，都应该可以定义会产生必要行为的宏或函数 [我认为始终如一地使用宏可能比有时使用宏有时使用宏更好函数，以便 #if defined macroname 测试事物是否已定义。

Answer 3

Intel 编译器支持 POSIX (Linux) 和非 POSIX (Windows) 操作系统，因此不能依赖 POSIX 或 Windows 函数。因此，选择了特定于编译器但 OS 不可知的解决方案。

C11 是一个很好的解决方案，但 Microsoft 甚至还不支持 C99，所以谁知道他们是否会支持 C11。

更新： 与 C11/POSIX/Windows 分配函数不同，ICC 内部函数包含一个释放函数。这允许此 API 使用与默认堆管理器不同的单独堆管理器。我不知道 if/when 它实际上是这样做的，但支持此模型可能很有用。

免责声明：我在英特尔工作，但对这些决定并不了解，这些决定发生在我加入公司之前很久。

为什么要使用 _mm_malloc？（相对于 _aligned_malloc、alligned_alloc 或 posix_memalign）

Why use _mm_malloc? (as opposed to _aligned_malloc, alligned_alloc, or posix_memalign)

c

memory-management

intel

dynamic-memory-allocation

为什么要使用 _mm_malloc？ （相对于 _aligned_malloc、alligned_alloc 或 posix_memalign）

Why use _mm_malloc? (as opposed to _aligned_malloc, alligned_alloc, or posix_memalign)

c

memory-management

intel

dynamic-memory-allocation

为什么要使用 _mm_malloc？（相对于 _aligned_malloc、alligned_alloc 或 posix_memalign）