Linux 内核:获取页面缓存在 NUMA 节点上的分布信息
Linux kernel: get information of page cache distribution over NUMA nodes
当Linux内核在NUMA上运行时,每个NUMA节点都有部分独立的内存管理。 There is echo '?' > /proc/sysrq-trigger
function "Will dump current memory info to your console." of SysRq (implemented as sysrq_handle_showmem
and show_mem
) 获取每个 NUMA 节点的基本内存统计信息到系统控制台、dmesg 和系统内核日志。
据我所知,内核的磁盘缓存打印了关于内存使用情况的数据(page cache) for every NUMA node, probably from active_file:%lu inactive_file:%lu
code of show_free_areas
。(从 free
工具输出缓存的行?)
我想通过频繁更新长时间监控 numa 节点上的磁盘缓存使用情况;我不想用 SysRq-m
的输出填充整个控制台和 dmesg。我打算找出多进程或多线程程序(未绑定到具有亲和力的核心或节点)如何与放置在其他节点内存中的页面缓存页面交互。
是否通过读取和解析 /proc
或 /sys
中的某些特殊文件,在不使用 sysrq 的情况下为程序访问发布此信息(每个 NUMA 节点的页面缓存内存使用情况)?或者是否需要为此编写新的内核模块?
free
工具使用 /proc/meminfo
打印 cache Memory used by the page cache and slabs for entire system; not for every NUMA node. I was unable find per-numa memory stats in http://man7.org/linux/man-pages/man5/proc.5.html proc 5 的手册页。
有 numastat: https://www.kernel.org/doc/Documentation/numastat.txt 但它没有页面缓存内存统计信息;据我了解,它仅说明跨 numa 页面分配计数,当进程经常在 NUMA 节点之间移动时,这可能毫无用处。
每个节点都有 /sys/devices/system/node/nodeX/meminfo
个包含基本内存信息的文件,例如 /sys/devices/system/node/node0/meminfo
用于 NUMA 节点 0,/sys/devices/system/node/node1/meminfo
用于节点 1,等等。
它们应该类似于 /proc/meminfo
系统范围的文件格式,它实际上被 free
实用程序使用;它的手册页有 meminfo
格式的基本描述:http://man7.org/linux/man-pages/man1/free.1.html
free displays the total amount of free and used physical and swap
memory in the system, as well as the buffers and caches used by the
kernel. The information is gathered by parsing /proc/meminfo. The
displayed columns are:
total Total installed memory (MemTotal and SwapTotal in /proc/meminfo)
used Used memory (calculated as total - free - buffers - cache)
free Unused memory (MemFree and SwapFree in /proc/meminfo)
shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)
buffers
Memory used by kernel buffers (Buffers in /proc/meminfo)
cache Memory used by the page cache and slabs (Cached and
SReclaimable in /proc/meminfo)
buff/cache
Sum of buffers and cache
https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-devices-node
中提到了 NUMA 的内存信息
What: /sys/devices/system/node/nodeX/meminfo
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Provides information about the node's distribution and memory
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt
完整的内存信息描述在 https://www.kernel.org/doc/Documentation/filesystems/proc.txt
你(我)需要来自 numa 节点 meminfo 的 "Cached" 行来获取有关 NUMA 节点之间页面缓存分布的信息:
Buffers: Relatively temporary storage for raw disk blocks
shouldn't get tremendously large (20MB or so)
Cached: in-memory cache for files read from the disk (the
pagecache). Doesn't include SwapCached
SReclaimable: Part of Slab, that might be reclaimed, such as caches
已用内存的某些部分可能是脏的:
Dirty: Memory which is waiting to get written back to the disk
Writeback: Memory which is actively being written back to the disk
它还显示有多少内存用于匿名用户空间任务:
AnonPages: Non-file backed pages mapped into userspace page tables
AnonHugePages: Non-file backed huge pages mapped into userspace page tables
当Linux内核在NUMA上运行时,每个NUMA节点都有部分独立的内存管理。 There is echo '?' > /proc/sysrq-trigger
function "Will dump current memory info to your console." of SysRq (implemented as sysrq_handle_showmem
and show_mem
) 获取每个 NUMA 节点的基本内存统计信息到系统控制台、dmesg 和系统内核日志。
据我所知,内核的磁盘缓存打印了关于内存使用情况的数据(page cache) for every NUMA node, probably from active_file:%lu inactive_file:%lu
code of show_free_areas
。(从 free
工具输出缓存的行?)
我想通过频繁更新长时间监控 numa 节点上的磁盘缓存使用情况;我不想用 SysRq-m
的输出填充整个控制台和 dmesg。我打算找出多进程或多线程程序(未绑定到具有亲和力的核心或节点)如何与放置在其他节点内存中的页面缓存页面交互。
是否通过读取和解析 /proc
或 /sys
中的某些特殊文件,在不使用 sysrq 的情况下为程序访问发布此信息(每个 NUMA 节点的页面缓存内存使用情况)?或者是否需要为此编写新的内核模块?
free
工具使用 /proc/meminfo
打印 cache Memory used by the page cache and slabs for entire system; not for every NUMA node. I was unable find per-numa memory stats in http://man7.org/linux/man-pages/man5/proc.5.html proc 5 的手册页。
有 numastat: https://www.kernel.org/doc/Documentation/numastat.txt 但它没有页面缓存内存统计信息;据我了解,它仅说明跨 numa 页面分配计数,当进程经常在 NUMA 节点之间移动时,这可能毫无用处。
每个节点都有 /sys/devices/system/node/nodeX/meminfo
个包含基本内存信息的文件,例如 /sys/devices/system/node/node0/meminfo
用于 NUMA 节点 0,/sys/devices/system/node/node1/meminfo
用于节点 1,等等。
它们应该类似于 /proc/meminfo
系统范围的文件格式,它实际上被 free
实用程序使用;它的手册页有 meminfo
格式的基本描述:http://man7.org/linux/man-pages/man1/free.1.html
free displays the total amount of free and used physical and swap
memory in the system, as well as the buffers and caches used by the
kernel. The information is gathered by parsing /proc/meminfo. The
displayed columns are:
total Total installed memory (MemTotal and SwapTotal in /proc/meminfo)
used Used memory (calculated as total - free - buffers - cache)
free Unused memory (MemFree and SwapFree in /proc/meminfo)
shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)
buffers
Memory used by kernel buffers (Buffers in /proc/meminfo)
cache Memory used by the page cache and slabs (Cached and
SReclaimable in /proc/meminfo)
buff/cache
Sum of buffers and cache
https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-devices-node
中提到了 NUMA 的内存信息What: /sys/devices/system/node/nodeX/meminfo
Date: October 2002
Contact: Linux Memory Management list <linux-mm@kvack.org>
Description:
Provides information about the node's distribution and memory
utilization. Similar to /proc/meminfo, see Documentation/filesystems/proc.txt
完整的内存信息描述在 https://www.kernel.org/doc/Documentation/filesystems/proc.txt
你(我)需要来自 numa 节点 meminfo 的 "Cached" 行来获取有关 NUMA 节点之间页面缓存分布的信息:
Buffers: Relatively temporary storage for raw disk blocks
shouldn't get tremendously large (20MB or so)
Cached: in-memory cache for files read from the disk (the
pagecache). Doesn't include SwapCached
SReclaimable: Part of Slab, that might be reclaimed, such as caches
已用内存的某些部分可能是脏的:
Dirty: Memory which is waiting to get written back to the disk
Writeback: Memory which is actively being written back to the disk
它还显示有多少内存用于匿名用户空间任务:
AnonPages: Non-file backed pages mapped into userspace page tables
AnonHugePages: Non-file backed huge pages mapped into userspace page tables