为什么 LuaJIT 的内存在 64 位平台上限制为 1-2 GB?
Why is LuaJIT's memory limited to 1-2 GB on 64-bit platforms?
在 64 位平台上,LuaJIT 最多只允许 1-2GB 的数据(不包括使用 malloc
分配的对象)。这个限制从何而来,为什么比在 32 位平台上还要少?
LuaJIT 被设计为使用 32 位指针。在 x64
平台上,限制来自 mmap 和 MAP_32BIT
标志的使用。
MAP_32BIT (since Linux 2.4.20, 2.6):
Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2GB of memory, so as to improve context-switch performance on some early 64-bit processors.
基本上使用此标志限制为前 31 位,而不是顾名思义的前 32 位。查看 here,了解在 Linux 内核中使用 MAP_32BIT
的 1GB 限制的概述。
即使您有超过 1GB 的内存,LuaJIT 作者也解释了为什么这对性能不利:
- A full GC takes 50% more time than the allocations themselves.
- If the GC is enabled, it doubles the allocation time.
- To simulate a real application, the links between objects are randomized in the third run. This doubles the GC time!
And that was just for 1GB! Now imagine using 8GB -- a full GC cycle would keep the CPU busy for a whopping 24 seconds!
Ok, so the normal mode is to use the incremental GC. But this just means the overhead is ~30% higher, it's mixed in between the allocations and it will evict the CPU cache every time. Basically your application will be dominated by the GC overhead and you'll begin to wonder why it's slow ....
tl;dr version: Don't try this at home. And the GC needs a rewrite (postponed to LuaJIT 2.1).
总而言之,1GB 限制是 Linux 内核和 LuaJIT 垃圾收集器的限制。这仅适用于 LuaJIT 状态内的对象,可以通过使用 malloc
来克服,它将在较低的 32 位地址 space 之外分配。此外,可以在 32 位模式下使用基于 x64
构建的 x86
并访问完整的 4GB。
查看这些链接以获取更多信息:
- How to get past 1gb memory limit of 64 bit LuaJIT on Linux?
- LuaJIT x64 limited to 31 bit address space, even without MAP_32BIT restrictions?
- LuaJIT strange memory limit
- Digging out the craziest bug you never heard about from 2008: a linux threading regression
由于最近patchluajit 2GB内存限制可以解决
要测试,请克隆 this repo 并使用定义的 LUAJIT_ENABLE_GC64
符号构建:
msvcbuild.bat gc64
或 XCFLAGS+= -DLUAJIT_ENABLE_GC64
在 Makefile
我使用这段代码来测试内存分配:
local ffi = require("ffi")
local CHUNK_SIZE = 1 * 1024 * 1024 * 1024
local fraction_of_gb = CHUNK_SIZE / (1024*1024*1024)
local allocations = {}
for index=1, 64 do
local huge_memory_chunk = ffi.new("char[?]", CHUNK_SIZE)
table.insert(allocations, huge_memory_chunk)
print( string.format("allocated %q GB", index*fraction_of_gb) )
local pause = io.read(1)
end
print("Test complete")
local pause = io.read(1)
在我的机器出现 not enough memory
错误之前分配了 48GB。
在 64 位平台上,LuaJIT 最多只允许 1-2GB 的数据(不包括使用 malloc
分配的对象)。这个限制从何而来,为什么比在 32 位平台上还要少?
LuaJIT 被设计为使用 32 位指针。在 x64
平台上,限制来自 mmap 和 MAP_32BIT
标志的使用。
MAP_32BIT (since Linux 2.4.20, 2.6):
Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2GB of memory, so as to improve context-switch performance on some early 64-bit processors.
基本上使用此标志限制为前 31 位,而不是顾名思义的前 32 位。查看 here,了解在 Linux 内核中使用 MAP_32BIT
的 1GB 限制的概述。
即使您有超过 1GB 的内存,LuaJIT 作者也解释了为什么这对性能不利:
- A full GC takes 50% more time than the allocations themselves.
- If the GC is enabled, it doubles the allocation time.
- To simulate a real application, the links between objects are randomized in the third run. This doubles the GC time!
And that was just for 1GB! Now imagine using 8GB -- a full GC cycle would keep the CPU busy for a whopping 24 seconds! Ok, so the normal mode is to use the incremental GC. But this just means the overhead is ~30% higher, it's mixed in between the allocations and it will evict the CPU cache every time. Basically your application will be dominated by the GC overhead and you'll begin to wonder why it's slow ....
tl;dr version: Don't try this at home. And the GC needs a rewrite (postponed to LuaJIT 2.1).
总而言之,1GB 限制是 Linux 内核和 LuaJIT 垃圾收集器的限制。这仅适用于 LuaJIT 状态内的对象,可以通过使用 malloc
来克服,它将在较低的 32 位地址 space 之外分配。此外,可以在 32 位模式下使用基于 x64
构建的 x86
并访问完整的 4GB。
查看这些链接以获取更多信息:
- How to get past 1gb memory limit of 64 bit LuaJIT on Linux?
- LuaJIT x64 limited to 31 bit address space, even without MAP_32BIT restrictions?
- LuaJIT strange memory limit
- Digging out the craziest bug you never heard about from 2008: a linux threading regression
由于最近patchluajit 2GB内存限制可以解决
要测试,请克隆 this repo 并使用定义的 LUAJIT_ENABLE_GC64
符号构建:
msvcbuild.bat gc64
或 XCFLAGS+= -DLUAJIT_ENABLE_GC64
在 Makefile
我使用这段代码来测试内存分配:
local ffi = require("ffi")
local CHUNK_SIZE = 1 * 1024 * 1024 * 1024
local fraction_of_gb = CHUNK_SIZE / (1024*1024*1024)
local allocations = {}
for index=1, 64 do
local huge_memory_chunk = ffi.new("char[?]", CHUNK_SIZE)
table.insert(allocations, huge_memory_chunk)
print( string.format("allocated %q GB", index*fraction_of_gb) )
local pause = io.read(1)
end
print("Test complete")
local pause = io.read(1)
在我的机器出现 not enough memory
错误之前分配了 48GB。