Prometheus (Docker)：确定每个节点的可用内存（哪个指标是正确的？）

Question

我们一直在努力为我们的节点运行 Docker 组件创建良好的内存监控。我们将 Prometheus 与 cadvisor 和 node_exporter 结合使用。

确定每个节点已用内存的最佳方法是什么？

方法 1：在我们的示例中给出大约 42%

(1-(node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes))*100

方法 2：给出大约 80%

(1-((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes))*100

Q2：为什么会有这种差异？我能从中学到什么？

所以我更深入地挖掘了确定的个人指标：

可用内存：在我们的实验中约为 5%

(node_memory_MemFree_bytes/node_memory_MemTotal_bytes)*100
缓冲内存：大约 0.002%

(node_memory_Buffers_bytes/node_memory_MemTotal_bytes)*100
缓存内存：大约 15%

(node_memory_Cached_bytes/node_memory_MemTotal_bytes)*100
可用内存：58%

(node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)*100

我希望 FreeMem + BufferedMem + CachedMem 会围绕 AvailableMem。但这不是这个简单实验的结果。

Q3：为什么这不是真的？

据说Linux上的空闲内存由空闲内存+缓冲内存+缓存内存组成。当内存不足时，可以释放缓存内存等

Answer 1

本文档详细说明了这些数字的含义： https://github.com/torvalds/linux/blob/master/Documentation/filesystems/proc.rst#meminfo

MemAvailable: An estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, SReclaimable, the size of the file LRU lists, and the low watermarks in each zone. The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable slab will be reclaimable, due to items being in use. The impact of those factors will vary from system to system.

因此 MemAvailable 是一个估计可以使用多少内存而无需为新进程交换。 FreeMem只是算入MemAvailable的一部分。 BufferedMem 和 CachedMem 可能会被纳入估计，但这只是可能被回收的内存的一小部分：

Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so)

Cached: in-memory cache for files read from the disk (the pagecache). Doesn't include SwapCached

Prometheus (Docker)：确定每个节点的可用内存（哪个指标是正确的？）

Prometheus (Docker): determine available memory per node (which metric is correct?)

docker-swarm

prometheus

prometheus-node-exporter