Linux 内核:获取页面缓存在 NUMA 节点上的分布信息

Linux kernel: get information of page cache distribution over NUMA nodes

当Linux内核在NUMA上运行时,每个NUMA节点都有部分独立的内存管理。 There is echo '?' > /proc/sysrq-trigger function "Will dump current memory info to your console." of SysRq (implemented as sysrq_handle_showmem and show_mem) 获取每个 NUMA 节点的基本内存统计信息到系统控制台、dmesg 和系统内核日志。

据我所知,内核的磁盘缓存打印了关于内存使用情况的数据(page cache) for every NUMA node, probably from active_file:%lu inactive_file:%lu code of show_free_areas。(从 free 工具输出缓存的行?)

我想通过频繁更新长时间监控 numa 节点上的磁盘缓存使用情况;我不想用 SysRq-m 的输出填充整个控制台和 dmesg。我打算找出多进程或多线程程序(未绑定到具有亲和力的核心或节点)如何与放置在其他节点内存中的页面缓存页面交互。

是否通过读取和解析 /proc/sys 中的某些特殊文件,在不使用 sysrq 的情况下为程序访问发布此信息(每个 NUMA 节点的页面缓存内存使用情况)?或者是否需要为此编写新的内核模块?

free 工具使用 /proc/meminfo 打印 cache Memory used by the page cache and slabs for entire system; not for every NUMA node. I was unable find per-numa memory stats in http://man7.org/linux/man-pages/man5/proc.5.html proc 5 的手册页。

有 numastat: https://www.kernel.org/doc/Documentation/numastat.txt 但它没有页面缓存内存统计信息;据我了解,它仅说明跨 numa 页面分配计数,当进程经常在 NUMA 节点之间移动时,这可能毫无用处。

每个节点都有 /sys/devices/system/node/nodeX/meminfo 个包含基本内存信息的文件,例如 /sys/devices/system/node/node0/meminfo 用于 NUMA 节点 0,/sys/devices/system/node/node1/meminfo 用于节点 1,等等。

它们应该类似于 /proc/meminfo 系统范围的文件格式,它实际上被 free 实用程序使用;它的手册页有 meminfo 格式的基本描述:http://man7.org/linux/man-pages/man1/free.1.html

   free displays the total amount of free and used physical and swap
   memory in the system, as well as the buffers and caches used by the
   kernel. The information is gathered by parsing /proc/meminfo. The
   displayed columns are:

   total  Total installed memory (MemTotal and SwapTotal in /proc/meminfo)

   used   Used memory (calculated as total - free - buffers - cache)

   free   Unused memory (MemFree and SwapFree in /proc/meminfo)

   shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)

   buffers
          Memory used by kernel buffers (Buffers in /proc/meminfo)

   cache  Memory used by the page cache and slabs (Cached and
          SReclaimable in /proc/meminfo)

   buff/cache
          Sum of buffers and cache

https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-devices-node

中提到了 NUMA 的内存信息
What:       /sys/devices/system/node/nodeX/meminfo
Date:       October 2002
Contact:    Linux Memory Management list <linux-mm@kvack.org>
Description:
        Provides information about the node's distribution and memory
        utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt

完整的内存信息描述在 https://www.kernel.org/doc/Documentation/filesystems/proc.txt

你(我)需要来自 numa 节点 meminfo 的 "Cached" 行来获取有关 NUMA 节点之间页面缓存分布的信息:

     Buffers: Relatively temporary storage for raw disk blocks
             shouldn't get tremendously large (20MB or so)
      Cached: in-memory cache for files read from the disk (the
             pagecache).  Doesn't include SwapCached
SReclaimable: Part of Slab, that might be reclaimed, such as caches

已用内存的某些部分可能是脏的:

    Dirty: Memory which is waiting to get written back to the disk
Writeback: Memory which is actively being written back to the disk

它还显示有多少内存用于匿名用户空间任务:

    AnonPages: Non-file backed pages mapped into userspace page tables
AnonHugePages: Non-file backed huge pages mapped into userspace page tables