内核日志中的内存分配失败是什么意思？

Question

这是 dmesg 命令在实例上的输出运行 Linux 可能处理内存紧缩。有关这些日志的含义的任何帮助？


dmesg | tail -n 25
[23498.234294]  warn_alloc+0x114/0x1c0
[23498.238447] ena 0000:00:05.0 eth0: refilled rx qid 1 with only 64 buffers (from 131)
[23498.242537]  __alloc_pages_slowpath+0xce2/0xd20
[23498.242541]  ? ___slab_alloc+0xc1/0x4b0
[23498.242544]  ? get_page_from_freelist+0x525/0xba0
[23498.268528]  __alloc_pages_nodemask+0x25d/0x280
[23498.271780]  ena_refill_rx_bufs+0x55/0x2c0 [ena]
[23498.275046]  ena_clean_rx_irq+0x4ac/0x840 [ena]
[23498.278303]  ? netif_receive_skb_internal+0x42/0xe0
[23498.281698]  ena_io_poll+0x2d1/0x720 [ena]
[23498.284738]  net_rx_action+0x156/0x3f0
[23498.287680]  __do_softirq+0xe3/0x2c7
[23498.290553]  irq_exit+0xbd/0xd0
[23498.391684]  do_IRQ+0x89/0xe0
[23498.394364]  common_interrupt+0x85/0x85
[23498.397335]  </IRQ>
[23498.399697] RIP: 0033:0x7fb8c2cc8ad4
[23498.402642] RSP: 002b:00007f98402f3ea0 EFLAGS: 00000216 ORIG_RAX: ffffffffffffff73
[23498.408552] RAX: 00007fb8b5f52815 RBX: 00007f9a190e9678 RCX: 00007f986351ede0
[23498.412754] RDX: 0000000000000164 RSI: 00007fb812079d50 RDI: 00007f99dbfa83d4
[23498.416948] RBP: 0000000000001fae R08: 0000000000000164 R09: 0000000000000075
[23498.421138] R10: 00007fb812079d38 R11: 0000000000000074 R12: 00007faee54d9ba0
[23498.425357] R13: 0000000000000001 R14: 0000000000000007 R15: 00007fb8b5f52800
[23498.429598] ena 0000:00:05.0 eth0: failed to alloc buffer for rx queue 0
[23498.433666] ena 0000:00:05.0 eth0: refilled rx qid 0 with only 62 buffers (from 132)

此外，有哪些可能的方法，

做进一步的根本原因分析？
缓解问题？

Answer 1

日志的第一部分是函数调用栈，从中可以看出这与ENA网络驱动有关，伙伴系统尝试分配页面时，由于内存不足而失败。

从第二部分，我们知道确切的消息：“无法为 rx 队列 0 分配缓冲区”。

在google之后，我找到一个blog对你有帮助。这是摘要。

This message will raise when the napi handler fails to refill new Rx descriptors, typically due to lack of memory. This situation might lead to performance decrease, given that some requests would have to be rescheduled.

这个问题的解决方案与“min_free_kbytes”内核参数的内存增加有关。例如：

vm.min_free_kbytes = 1048576

将以下命令放入 /etc/sysctl.conf。并加载新设置：

sysctl -p

建议至少有 512MB，受限环境至少有 128MB。对于大型实例类型运行压力作业（例如 64+ vCores + 256GiB + RAM），此值通常可以设置为 10MB。

内核日志中的内存分配失败是什么意思？

What does memory allocation failure in Kernel Logs mean?

linux

ram

kernel

linux-kernel

dmesg