编译器如何存储有关数组大小的信息？

Question

最近我阅读了 GCC 中 malloc 函数的 IsoCpp about how compiler known size of array created with new. The FAQ describes two ways of implementation, but so basically and without any internal information. I tried to find an implementation of these mechanisms in STL sources from Microsoft and GCC, but as I see, both of them just call the malloc internally. I tried to go deeper and found an implementation，但我无法弄清楚魔法发生在哪里。是否有可能找到它是如何工作的，或者它是在系统运行时库中实现的？

Answer 1

至少对于 GCC 目标 x86_64，可以通过查看 GCC 为这个简单程序生成的程序集来调查这个问题：

#include <iostream>

struct Foo
{
  int x, y;
  ~Foo() { std::cout << "Delete foo " << this << std::endl; }
};

Foo * create()
{
  return new Foo[8];
}

void destroy(Foo * p)
{
  delete[] p;
}

int main()
{
  destroy(create());
}

使用 Compiler Explorer，我们看到为 create 函数生成的代码：

create():
        sub     rsp, 8
        mov     edi, 72
        call    operator new[](unsigned long)
        mov     QWORD PTR [rax], 8
        add     rax, 8
        add     rsp, 8
        ret

在我看来，编译器正在调用 operator new[] 来分配 72 字节的内存，这比存储对象 (8 * 8 = 64) 所需的内存多 8 个字节。然后就是在本次分配的开始存储对象计数(8)，返回前指针加8字节，所以指针指向第一个对象。

这是您链接到的 document 中列出的方法之一：

Over-allocate the array and put n just to the left of the first Fred object.

我在 libstdc++ 的源代码中搜索了一下，看看这是标准库还是编译器实现的，我认为它实际上是由编译器本身实现的，尽管我可能是错的。

Answer 2

这是编译器在 GCC 源代码中存储大小的位置：https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/init.c#L3319-L3325

以及 Clang 源代码中的等效位置：https://github.com/llvm/llvm-project/blob/c11051a4001c7f89e8655f1776a75110a562a45e/clang/lib/CodeGen/ItaniumCXXABI.cpp#L2183-L2185

编译器所做的是在 new T[N] returns。这反过来意味着必须在对 operator new[] 的调用中分配一些额外的字节。编译器生成代码在运行时执行此操作。

operator new[](std::size_t x) 本身不起作用：它只是分配 x 字节。编译器使 new T[N] 调用 operator new[](sizeof(T) * N + cookie_size).

编译器不“知道”大小（它是一个 run-time 值），但它知道如何生成代码以在后续 delete[] p.

上检索大小

编译器如何存储有关数组大小的信息？

How does a compiler store information about an array's size?

c++

arrays

compiler-construction

internals