为什么 LuaJIT 的内存在 64 位平台上限制为 1-2 GB?

Why is LuaJIT's memory limited to 1-2 GB on 64-bit platforms?

在 64 位平台上,LuaJIT 最多只允许 1-2GB 的数据(不包括使用 malloc 分配的对象)。这个限制从何而来,为什么比在 32 位平台上还要少?

LuaJIT 被设计为使用 32 位指针。在 x64 平台上,限制来自 mmapMAP_32BIT 标志的使用。

MAP_32BIT (since Linux 2.4.20, 2.6):

Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2GB of memory, so as to improve context-switch performance on some early 64-bit processors.

基本上使用此标志限制为前 31 位,而不是顾名思义的前 32 位。查看 here,了解在 Linux 内核中使用 MAP_32BIT 的 1GB 限制的概述。

即使您有超过 1GB 的内存,LuaJIT 作者也解释了为什么这对性能不利:

  • A full GC takes 50% more time than the allocations themselves.
  • If the GC is enabled, it doubles the allocation time.
  • To simulate a real application, the links between objects are randomized in the third run. This doubles the GC time!

And that was just for 1GB! Now imagine using 8GB -- a full GC cycle would keep the CPU busy for a whopping 24 seconds! Ok, so the normal mode is to use the incremental GC. But this just means the overhead is ~30% higher, it's mixed in between the allocations and it will evict the CPU cache every time. Basically your application will be dominated by the GC overhead and you'll begin to wonder why it's slow ....

tl;dr version: Don't try this at home. And the GC needs a rewrite (postponed to LuaJIT 2.1).

总而言之,1GB 限制是 Linux 内核和 LuaJIT 垃圾收集器的限制。这仅适用于 LuaJIT 状态内的对象,可以通过使用 malloc 来克服,它将在较低的 32 位地址 space 之外分配。此外,可以在 32 位模式下使用基于 x64 构建的 x86 并访问完整的 4GB。

查看这些链接以获取更多信息:

由于最近patchluajit 2GB内存限制可以解决

要测试,请克隆 this repo 并使用定义的 LUAJIT_ENABLE_GC64 符号构建:

msvcbuild.bat gc64

XCFLAGS+= -DLUAJIT_ENABLE_GC64Makefile

我使用这段代码来测试内存分配:

local ffi = require("ffi")

local CHUNK_SIZE     = 1 * 1024 * 1024 * 1024
local fraction_of_gb = CHUNK_SIZE / (1024*1024*1024)
local allocations    = {}

for index=1, 64 do
    local huge_memory_chunk = ffi.new("char[?]", CHUNK_SIZE)
    table.insert(allocations, huge_memory_chunk)
    print( string.format("allocated %q GB", index*fraction_of_gb) )
    local pause = io.read(1)
end

print("Test complete")
local pause = io.read(1)

在我的机器出现 not enough memory 错误之前分配了 48GB。