在 64 位系统的低地址分配内存最可靠/可移植的方法是什么?
What is the most reliable / portable way to allocate memory at low addresses on 64-bit systems?
我需要分配位于虚拟地址前 32GB 范围内的大内存块(供我的自定义分配器使用)space。
我想如果我需要,比如说,1MB 块,我可以使用 mmap
和 MAP_FIXED_NOREPLACE
(或 VirtualAlloc)从低地址开始以 1MB 为增量进行迭代,直到调用成功。从上一个成功的块继续下一个。
这听起来很笨拙,但至少它对 OS 地址 space 布局变化和 ASLR 算法的变化有一定的鲁棒性。根据我对当前 OS 布局的理解,这样前 32GB 应该有足够的可用内存,但也许我遗漏了什么?
Windows、Linux、OS X、iOS 或 Android 中是否有任何内容会破坏此方案?有没有更好的方法?
以防万一你想知道,这是为了实现一种编程语言的虚拟机,在这种语言中,在 64 位系统上将所有指针都设置为 32 位值可以带来巨大的内存使用优势甚至速度收益。由于所有对象至少8字节对齐,低3位可以移出,指针范围从4GB扩展到32GB。
为了在windows 中限制分配的内存范围,我们可以使用NtAllocateVirtualMemory
函数- 这个api 可用于用户模式和内核模式。在用户模式下,它由 ntdll.dll 导出(使用 ntdll.lib 或 ntdllp.lib 来自 wdk)。在这个api存在参数-ZeroBits-段视图基地址中必须为零的高位地址位数.但是在 msdn link 下关于 ZeroBits 的词是不正确的。正确的是:
ZeroBits
Supplies the number of high order address bits that must be zero in
the base address of the section view. The value of this argument must
be less than or equal to the maximum number of zero bits and is only
used when memory management determines where to allocate the view
(i.e. when BaseAddress is null).
If ZeroBits is zero, then no zero bit constraints are applied.
If ZeroBits is greater than 0 and less than 32, then it is the
number of leading zero bits from bit 31. Bits 63:32 are also required
to be zero. This retains compatibility with 32-bit systems.
If ZeroBits is greater than 32, then it is considered as a mask and then number of leading zero are counted out
in the mask. This then becomes the zero bits argument.
所以我们真的可以使用 ZeroBits 作为掩码 - 这是最耗电的。但可以使用 and 作为从 31 位开始的零位计数(在这种情况下,63-32 位将始终等于 0)。因为分配粒度(当前为 64kb - 0x10000)- 低 16 位始终为 0。因此 ZeroBits 在位数模式下的有效值 - 从 1 到 15 (=31-16)。为了更好地理解此参数的工作原理 - 查看示例代码。为了更好的演示效果,我将使用
MEM_TOP_DOWN
The specified region should be created at the highest virtual address
possible based on ZeroBits.
PVOID BaseAddress;
ULONG_PTR ZeroBits;
SIZE_T RegionSize = 1;
NTSTATUS status;
for (ZeroBits = 0xFFFFFFFFFFFFFFFF;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%p:%p\n", ZeroBits, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
ZeroBits >>= 1;
}
else
{
DbgPrint("%x\n", status);
break;
}
}
for(ZeroBits = 0;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%x:%p\n", ZeroBits++, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
}
else
{
DbgPrint("%x\n", status);
break;
}
}
并输出:
FFFFFFFFFFFFFFFF:00007FF735B40000
7FFFFFFFFFFFFFFF:00007FF735B40000
3FFFFFFFFFFFFFFF:00007FF735B40000
1FFFFFFFFFFFFFFF:00007FF735B40000
0FFFFFFFFFFFFFFF:00007FF735B40000
07FFFFFFFFFFFFFF:00007FF735B40000
03FFFFFFFFFFFFFF:00007FF735B40000
01FFFFFFFFFFFFFF:00007FF735B40000
00FFFFFFFFFFFFFF:00007FF735B40000
007FFFFFFFFFFFFF:00007FF735B40000
003FFFFFFFFFFFFF:00007FF735B40000
001FFFFFFFFFFFFF:00007FF735B40000
000FFFFFFFFFFFFF:00007FF735B40000
0007FFFFFFFFFFFF:00007FF735B40000
0003FFFFFFFFFFFF:00007FF735B40000
0001FFFFFFFFFFFF:00007FF735B40000
0000FFFFFFFFFFFF:00007FF735B40000
00007FFFFFFFFFFF:00007FF735B40000
00003FFFFFFFFFFF:00003FFFFFFF0000
00001FFFFFFFFFFF:00001FFFFFFF0000
00000FFFFFFFFFFF:00000FFFFFFF0000
000007FFFFFFFFFF:000007FFFFFF0000
000003FFFFFFFFFF:000003FFFFFF0000
000001FFFFFFFFFF:000001FFFFFF0000
000000FFFFFFFFFF:000000FFFFFF0000
0000007FFFFFFFFF:0000007FFFFF0000
0000003FFFFFFFFF:0000003FFFFF0000
0000001FFFFFFFFF:0000001FFFFF0000
0000000FFFFFFFFF:0000000FFFFF0000
00000007FFFFFFFF:00000007FFFF0000
00000003FFFFFFFF:00000003FFFF0000
00000001FFFFFFFF:00000001FFFF0000
00000000FFFFFFFF:00000000FFFF0000
000000007FFFFFFF:000000007FFF0000
000000003FFFFFFF:000000003FFF0000
000000001FFFFFFF:000000001FFF0000
000000000FFFFFFF:000000000FFF0000
0000000007FFFFFF:0000000007FF0000
0000000003FFFFFF:0000000003FF0000
0000000001FFFFFF:0000000001FF0000
0000000000FFFFFF:0000000000FF0000
00000000007FFFFF:00000000007F0000
00000000003FFFFF:00000000003F0000
00000000001FFFFF:00000000001F0000
00000000000FFFFF:00000000000F0000
000000000007FFFF:0000000000070000
000000000003FFFF:0000000000030000
000000000001FFFF:0000000000010000
c0000017
0:00007FF735B40000
1:000000007FFF0000
2:000000003FFF0000
3:000000001FFF0000
4:000000000FFF0000
5:0000000007FF0000
6:0000000003FF0000
7:0000000001FF0000
8:0000000000FF0000
9:00000000007F0000
a:00000000003F0000
b:00000000001F0000
c:00000000000F0000
d:0000000000070000
e:0000000000030000
f:0000000000010000
c0000017
所以如果我们说要将内存分配限制为 32Gb(0x800000000)
- 我们可以使用 ZeroBits = 0x800000000 - 1
:
NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
0x800000000 - 1, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)
结果内存将在[0, 7FFFFFFFF]
范围内分配(实际上[0, 7FFFF0000]
因为分配粒度低16位地址始终为0)
你可以通过 RtlCreateHeap
在分配的区域范围内创建堆并从此堆分配内存(注意 - 这也是用户模式 api - 使用 ntdll[p] .lib 用于 linker 输入)
PVOID BaseAddress = 0;
SIZE_T RegionSize = 0x10000000;// reserve 256Mb
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &BaseAddress,
0x800000000 - 1, &RegionSize, MEM_RESERVE, PAGE_READWRITE))
{
if (PVOID hHeap = RtlCreateHeap(0, BaseAddress, RegionSize, 0, 0, 0))
{
HeapAlloc(hHeap, 0, <somesize>);
RtlDestroyHeap(hHeap);
}
VirtualFree(BaseAddress, 0, MEM_RELEASE);
}
我需要分配位于虚拟地址前 32GB 范围内的大内存块(供我的自定义分配器使用)space。
我想如果我需要,比如说,1MB 块,我可以使用 mmap
和 MAP_FIXED_NOREPLACE
(或 VirtualAlloc)从低地址开始以 1MB 为增量进行迭代,直到调用成功。从上一个成功的块继续下一个。
这听起来很笨拙,但至少它对 OS 地址 space 布局变化和 ASLR 算法的变化有一定的鲁棒性。根据我对当前 OS 布局的理解,这样前 32GB 应该有足够的可用内存,但也许我遗漏了什么?
Windows、Linux、OS X、iOS 或 Android 中是否有任何内容会破坏此方案?有没有更好的方法?
以防万一你想知道,这是为了实现一种编程语言的虚拟机,在这种语言中,在 64 位系统上将所有指针都设置为 32 位值可以带来巨大的内存使用优势甚至速度收益。由于所有对象至少8字节对齐,低3位可以移出,指针范围从4GB扩展到32GB。
为了在windows 中限制分配的内存范围,我们可以使用NtAllocateVirtualMemory
函数- 这个api 可用于用户模式和内核模式。在用户模式下,它由 ntdll.dll 导出(使用 ntdll.lib 或 ntdllp.lib 来自 wdk)。在这个api存在参数-ZeroBits-段视图基地址中必须为零的高位地址位数.但是在 msdn link 下关于 ZeroBits 的词是不正确的。正确的是:
ZeroBits
Supplies the number of high order address bits that must be zero in the base address of the section view. The value of this argument must be less than or equal to the maximum number of zero bits and is only used when memory management determines where to allocate the view (i.e. when BaseAddress is null).
If ZeroBits is zero, then no zero bit constraints are applied.
If ZeroBits is greater than 0 and less than 32, then it is the number of leading zero bits from bit 31. Bits 63:32 are also required to be zero. This retains compatibility with 32-bit systems. If ZeroBits is greater than 32, then it is considered as a mask and then number of leading zero are counted out in the mask. This then becomes the zero bits argument.
所以我们真的可以使用 ZeroBits 作为掩码 - 这是最耗电的。但可以使用 and 作为从 31 位开始的零位计数(在这种情况下,63-32 位将始终等于 0)。因为分配粒度(当前为 64kb - 0x10000)- 低 16 位始终为 0。因此 ZeroBits 在位数模式下的有效值 - 从 1 到 15 (=31-16)。为了更好地理解此参数的工作原理 - 查看示例代码。为了更好的演示效果,我将使用
MEM_TOP_DOWN
The specified region should be created at the highest virtual address possible based on ZeroBits.
PVOID BaseAddress;
ULONG_PTR ZeroBits;
SIZE_T RegionSize = 1;
NTSTATUS status;
for (ZeroBits = 0xFFFFFFFFFFFFFFFF;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%p:%p\n", ZeroBits, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
ZeroBits >>= 1;
}
else
{
DbgPrint("%x\n", status);
break;
}
}
for(ZeroBits = 0;;)
{
if (0 <= (status = NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
ZeroBits, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)))
{
DbgPrint("%x:%p\n", ZeroBits++, BaseAddress);
NtFreeVirtualMemory(NtCurrentProcess(), &BaseAddress, &RegionSize, MEM_RELEASE);
}
else
{
DbgPrint("%x\n", status);
break;
}
}
并输出:
FFFFFFFFFFFFFFFF:00007FF735B40000
7FFFFFFFFFFFFFFF:00007FF735B40000
3FFFFFFFFFFFFFFF:00007FF735B40000
1FFFFFFFFFFFFFFF:00007FF735B40000
0FFFFFFFFFFFFFFF:00007FF735B40000
07FFFFFFFFFFFFFF:00007FF735B40000
03FFFFFFFFFFFFFF:00007FF735B40000
01FFFFFFFFFFFFFF:00007FF735B40000
00FFFFFFFFFFFFFF:00007FF735B40000
007FFFFFFFFFFFFF:00007FF735B40000
003FFFFFFFFFFFFF:00007FF735B40000
001FFFFFFFFFFFFF:00007FF735B40000
000FFFFFFFFFFFFF:00007FF735B40000
0007FFFFFFFFFFFF:00007FF735B40000
0003FFFFFFFFFFFF:00007FF735B40000
0001FFFFFFFFFFFF:00007FF735B40000
0000FFFFFFFFFFFF:00007FF735B40000
00007FFFFFFFFFFF:00007FF735B40000
00003FFFFFFFFFFF:00003FFFFFFF0000
00001FFFFFFFFFFF:00001FFFFFFF0000
00000FFFFFFFFFFF:00000FFFFFFF0000
000007FFFFFFFFFF:000007FFFFFF0000
000003FFFFFFFFFF:000003FFFFFF0000
000001FFFFFFFFFF:000001FFFFFF0000
000000FFFFFFFFFF:000000FFFFFF0000
0000007FFFFFFFFF:0000007FFFFF0000
0000003FFFFFFFFF:0000003FFFFF0000
0000001FFFFFFFFF:0000001FFFFF0000
0000000FFFFFFFFF:0000000FFFFF0000
00000007FFFFFFFF:00000007FFFF0000
00000003FFFFFFFF:00000003FFFF0000
00000001FFFFFFFF:00000001FFFF0000
00000000FFFFFFFF:00000000FFFF0000
000000007FFFFFFF:000000007FFF0000
000000003FFFFFFF:000000003FFF0000
000000001FFFFFFF:000000001FFF0000
000000000FFFFFFF:000000000FFF0000
0000000007FFFFFF:0000000007FF0000
0000000003FFFFFF:0000000003FF0000
0000000001FFFFFF:0000000001FF0000
0000000000FFFFFF:0000000000FF0000
00000000007FFFFF:00000000007F0000
00000000003FFFFF:00000000003F0000
00000000001FFFFF:00000000001F0000
00000000000FFFFF:00000000000F0000
000000000007FFFF:0000000000070000
000000000003FFFF:0000000000030000
000000000001FFFF:0000000000010000
c0000017
0:00007FF735B40000
1:000000007FFF0000
2:000000003FFF0000
3:000000001FFF0000
4:000000000FFF0000
5:0000000007FF0000
6:0000000003FF0000
7:0000000001FF0000
8:0000000000FF0000
9:00000000007F0000
a:00000000003F0000
b:00000000001F0000
c:00000000000F0000
d:0000000000070000
e:0000000000030000
f:0000000000010000
c0000017
所以如果我们说要将内存分配限制为 32Gb(0x800000000)
- 我们可以使用 ZeroBits = 0x800000000 - 1
:
NtAllocateVirtualMemory(NtCurrentProcess(), &(BaseAddress = 0),
0x800000000 - 1, &RegionSize, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS)
结果内存将在[0, 7FFFFFFFF]
范围内分配(实际上[0, 7FFFF0000]
因为分配粒度低16位地址始终为0)
你可以通过 RtlCreateHeap
在分配的区域范围内创建堆并从此堆分配内存(注意 - 这也是用户模式 api - 使用 ntdll[p] .lib 用于 linker 输入)
PVOID BaseAddress = 0;
SIZE_T RegionSize = 0x10000000;// reserve 256Mb
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &BaseAddress,
0x800000000 - 1, &RegionSize, MEM_RESERVE, PAGE_READWRITE))
{
if (PVOID hHeap = RtlCreateHeap(0, BaseAddress, RegionSize, 0, 0, 0))
{
HeapAlloc(hHeap, 0, <somesize>);
RtlDestroyHeap(hHeap);
}
VirtualFree(BaseAddress, 0, MEM_RELEASE);
}