VM 中的 Hugepagesize 没有增加到 1G

Question

我在 ESXi 服务器中使用 CentOS 虚拟机。我想将大页面大小增加到 1G。

我关注了link： http://dpdk-guide.gitlab.io/dpdk-guide/setup/hugepages.html

我执行了小脚本来检查是否支持 1 GB 的大小：

[root@localhost ~]# if grep pdpe1gb /proc/cpuinfo >/dev/null 2>&1; then echo "1GB supported."; fi
1GB supported.
[root@localhost ~]#

我将 default_hugepagesz=1GB hugepagesz=1G hugepages=4 添加到 /etc/default/grub。
grub2-mkconfig -o /boot/grub2/grub.cfg
重新启动虚拟机。

但我仍然可以看到 2048 KB (2MB) 的超大页面大小。

[root@localhost ~]# cat /proc/meminfo | grep -i huge
AnonHugePages:      8192 kB
HugePages_Total:    1024
HugePages_Free:     1024
HugePages_Rsvd:        0
HugePages_Surp:        0
**Hugepagesize:       2048 kB**
[root@localhost ~]#

VM详情如下：

[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]#

[root@localhost ~]# cat /proc/cpuinfo  | grep -i flags
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi ept vpid
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx hypervisor lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi ept vpid
[root@localhost ~]#

8GB 内存和 2 个 CPU 分配给 VM。

Answer 1

CPU 1gb 大页面支持标志和 guest OS support/enabling 不足以让 1gb 大页面在虚拟化环境中工作。

在 PMD（PAE 和 x86_64 之前为 2MB 或 4 MB）和 PUD 级别（1 GB）上的大页面的想法是创建从对齐的大尺寸虚拟区域到某个大区域的映射物理内存（据我所知，它也应该对齐）。随着管理程序的额外虚拟化级别，现在有三个（或四个）内存级别：来宾中应用程序的虚拟内存 OS，一些被来宾视为物理内存 OS（它是由虚拟化解决方案：ESXi、Xen、KVM、....)，以及真实的物理内存。可以合理地假设大页面的想法应该在所有三个级别中具有相同大小的大区域才有用（产生更少的 TLB 未命中，使用更少的页面 Table 结构来描述大量内存 - grep "Need bigger than 4KB pages"在 DickSites's "Datacenter Computers: modern challenges in CPU design", Google, Feb2015).

因此，要在 Guest OS 中使用某个级别的大页面，您应该已经在物理内存（在您的主机 OS 中）和您的虚拟化解决方案中拥有相同大小的大页面。 当您的主机 OS 和虚拟化软件 不可用时，您无法在 Guest 中有效地使用大页面。（有些像 qemu 或 bochs 可能会模拟它们，但这会从慢到非常慢。）当您同时需要 2 MB 和 1 GB 大页面时：您的 CPU、主机 OS、虚拟系统和Guest OS 都应该支持它们（并且主机系统应该有足够对齐的连续物理内存来分配 1 GB 页面，您可能无法在 NUMA 中将此页面拆分为多个套接字）。

不知道 ESXi，但有一些链接

RedHat 和一些（？）linux 虚拟化解决方案（使用 libvirtd）。在 "Virtualization Tuning and Optimization Guide" 中列出了主机 OS 的手动大页面：⁠8.3.3.3。在引导或运行时为来宾启用 1 GB 大页面 :

Procedure 8.2. Allocating 1 GB huge pages at boot time

To allocate different sizes of huge pages at boot, use the following command, specifying the number of huge pages. This example allocates 4 1 GB huge pages and 1024 2 MB huge pages: 'default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024' Change this command line to specify a different number of huge pages to be allocated at boot.

Note The next two steps must also be completed the first time you allocate 1 GB huge pages at boot time.

Mount the 2 MB and 1 GB huge pages on the host:

# mkdir /dev/hugepages1G # mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G # mkdir /dev/hugepages2M # mount -t hugetlbfs -o pagesize=2M none /dev/hugepages2M

Restart libvirtd to enable the use of 1 GB huge pages on guests:

# service restart libvirtd

1 GB huge pages are now available for guests.

对于 Ubuntu 和 KVM："KVM - Using Hugepages"

By increasing the page size, you reduce the page table and reduce the pressure on the TLB cache. ... vm.nr_hugepages = 256 ... Reboot the system (note: this is about physical reboot of host machine and host OS) ... Set up Libvirt to use Huge Pages KVM_HUGEPAGES=1 ... Setting up a guest to use Huge Pages

对于 Fedora 和 KVM（旧手册约 2MB 页）：https://fedoraproject.org/wiki/Features/KVM_Huge_Page_Backed_Memory
ESXi 5 支持 2MB 页面，需要手动启用：How to Modify Large Memory Page Settings on ESXi
对于未知版本的"VMware’s ESX server"，来自2015年3月的论文：BQ Pham，"Using TLB Speculation to Overcome Page Splintering in Virtual Machines", Rutgers University Technical Report DCS-TR-713, March 2015：

Lack of hypervisor support for large pages: Finally, hypervisor vendors can take a few production cycles before fully adopting large pages. For example, VMware’s ESX server currently has no support for 1GB large pages in the hypervisor, even though guests on x86-64 systems can use them.

较新的论文，关于 1GB 页面没有直接结论：https://rucore.libraries.rutgers.edu/rutgers-lib/49279/PDF/1/

We find that large pages are conflicted with lightweight memory management across a range of hypervisors (e.g., ESX, KVM) across architectures (e.g., ARM, x86-64) and container-based technologies.

来自 VMWare 的旧 pdf："Large Page Performance. ESX Server 3.5 and ESX Server 3i v3.5"。 https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/large_pg_performance.pdf --- 仅支持 2MB 大页面

VMware ESX Server 3.5 and VMware ESX Server 3i v3.5 introduce 2MB large page support to the virtualized environment. In earlier versions of ESX Server, guest operating system large pages were emulated using small pages. This meant that, even if the guest operating system was using large pages, it did not get the performance benefit of reducing TLB misses. The enhanced large page support in ESX Server 3.5 and ESX Server 3i v3.5 enables 32‐bit virtual machines in PAE mode and 64‐bit virtual machines to make use of large pages.

Answer 2

直通主机 cpu 到 VM 为我工作，这给了 VM pdpe1gb cpu 标志。

我使用 Qemu + libvirt，在主机上启用 1G hugepagesz。

Answer 3

也许有用。在 xml 中设置 cpu fuature 描述 vm 如下：

  <cpu mode='custom' match='exact' check='partial'>
      <model fallback='allow'>Broadwell</model>
      <feature policy='force' name='pdpe1gb'/>
  </cpu>

VM 中的 Hugepagesize 没有增加到 1G

Hugepagesize is not increasing to 1G in VM

operating-system

centos

esxi

huge-pages

dpdk