用户进程是否可以告诉 OS 将 mmap 完成的映射重新定位到其他 NUMA 节点?

Is it possible for a user process to tell the OS to relocate the mapping done by mmap to other NUMA node?

考虑这种情况:NUMA 机器上的用户进程 运行 调用 mmap 在虚拟地址 space 中创建新映射。然后它使用 mmap 返回的内存进行处理(存储其数据,...)。现在由于某些原因,用户进程被调度到不同的 NUMA 节点。用户进程是否可以告诉 OS 在保留数据的同时重新定位已映射的内存(到不同的 NUMA 节点)?

通过从 libnuma (-lnuma) 调用 migrate_pages 可以迁移物理内存:http://man7.org/linux/man-pages/man2/migrate_pages.2.html

 long migrate_pages(int pid, unsigned long maxnode,
                      const unsigned long *old_nodes,
                      const unsigned long *new_nodes);

Link with -lnuma.

migrate_pages() attempts to move all pages of the process pid that are in memory nodes old_nodes to the memory nodes in new_nodes. Pages not located in any node in old_nodes will not be migrated. As far as possible, the kernel maintains the relative topology relationship inside old_nodes during the migration to new_nodes.

numactl包里也有工具migratepages迁移pid的所有页面:http://man7.org/linux/man-pages/man8/migratepages.8.html

您还可以使用 set_mempolicy 更改内存策略:http://man7.org/linux/man-pages/man2/set_mempolicy.2.html

mbind syscall 可用于将页面子集迁移到某个 NUMA 节点:

https://www.kernel.org/doc/Documentation/vm/page_migration

...allows a process to manually relocate the node on which its pages are located through the MF_MOVE and MF_MOVE_ALL options while setting a new memory policy via mbind()

http://man7.org/linux/man-pages/man2/mbind.2.html

  If MPOL_MF_MOVE is specified in flags, then the kernel will attempt
   to move all the existing pages in the memory range so that they
   follow the policy.  Pages that are shared with other processes will
   not be moved.  If MPOL_MF_STRICT is also specified, then the call
   will fail with the error EIO if some pages could not be moved.