为什么用户进程可以由于内存碎片而调用 Linux OOM-killer,即使有足够的 RAM 可用?
Why can a user-process invoke the Linux OOM-killer due to memory fragmentation, even though plenty of RAM is available?
我有一个基于 ARM 的无头 Linux (v3.10.53-1.1.1) 系统,没有启用交换 space,我偶尔会看到进程被 OOM 杀死-杀手,即使有足够的 RAM 可用。
运行 echo 1 > /proc/sys/vm/compact_memory
似乎周期性地阻止了 OOM 杀手,这让我认为内存碎片是罪魁祸首,但我不明白为什么用户进程需要无论如何物理上连续的块;据我了解,即使在最坏的情况下(完全碎片化,只有单个 4K 块可用),内核也可以简单地分配必要数量的单个 4K 块,然后使用虚拟内存魔法 (tm) 来制作它们看起来与用户进程相邻。
有人可以解释为什么会调用 OOM-killer 来响应内存碎片吗?它只是一个有问题的内核还是有真正的原因? (即使内核确实需要整理内存碎片以满足请求,它不应该自动执行而不是放弃和 OOM'ing 吗?)
我在下面粘贴了一个 OOM 杀手调用示例,以防它阐明一些事情。我可以随意重现故障;此调用发生时计算机仍有约 120MB 可用 RAM(根据 free
),以响应我的测试程序分配内存,一次分配 10000 个 400 字节。
May 28 18:51:34 g2 user.warn kernel: [ 4228.307769] cored invoked oom-killer: gfp_mask=0x2084d0, order=0, oom_score_adj=0
May 28 18:51:35 g2 user.warn kernel: [ 4228.315295] CPU: 2 PID: 19687 Comm: cored Tainted: G O 3.10.53-1.1.1_ga+gf57416a #1
May 28 18:51:35 g2 user.warn kernel: [ 4228.323843] Backtrace:
May 28 18:51:35 g2 user.warn kernel: [ 4228.326340] [<c0011c54>] (dump_backtrace+0x0/0x10c) from [<c0011e68>] (show_stack+0x18/0x1c)
May 28 18:51:35 g2 user.warn kernel: [ 4228.334804] r6:00000000 r5:00000000 r4:c9784000 r3:00000000
May 28 18:51:35 g2 user.warn kernel: [ 4228.340566] [<c0011e50>] (show_stack+0x0/0x1c) from [<c04d0dd8>] (dump_stack+0x24/0x28)
May 28 18:51:35 g2 user.warn kernel: [ 4228.348684] [<c04d0db4>] (dump_stack+0x0/0x28) from [<c009b474>] (dump_header.isra.10+0x84/0x19c)
May 28 18:51:35 g2 user.warn kernel: [ 4228.357616] [<c009b3f0>] (dump_header.isra.10+0x0/0x19c) from [<c009ba3c>] (oom_kill_process+0x288/0x3f4)
May 28 18:51:35 g2 user.warn kernel: [ 4228.367230] [<c009b7b4>] (oom_kill_process+0x0/0x3f4) from [<c009bf8c>] (out_of_memory+0x208/0x2cc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.376323] [<c009bd84>] (out_of_memory+0x0/0x2cc) from [<c00a0278>] (__alloc_pages_nodemask+0x8f8/0x910)
May 28 18:51:35 g2 user.warn kernel: [ 4228.385921] [<c009f980>] (__alloc_pages_nodemask+0x0/0x910) from [<c00b6c34>] (__pte_alloc+0x2c/0x158)
May 28 18:51:35 g2 user.warn kernel: [ 4228.395263] [<c00b6c08>] (__pte_alloc+0x0/0x158) from [<c00b9fe0>] (handle_mm_fault+0xd4/0xfc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.403914] r6:c981a5d8 r5:cc421a40 r4:10400000 r3:10400000
May 28 18:51:35 g2 user.warn kernel: [ 4228.409689] [<c00b9f0c>] (handle_mm_fault+0x0/0xfc) from [<c0019a00>] (do_page_fault+0x174/0x3dc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.418575] [<c001988c>] (do_page_fault+0x0/0x3dc) from [<c0019dc0>] (do_translation_fault+0xb4/0xb8)
May 28 18:51:35 g2 user.warn kernel: [ 4228.427824] [<c0019d0c>] (do_translation_fault+0x0/0xb8) from [<c00083ac>] (do_DataAbort+0x40/0xa0)
May 28 18:51:35 g2 user.warn kernel: [ 4228.436896] r6:c0019d0c r5:00000805 r4:c06a33d0 r3:103ffea8
May 28 18:51:35 g2 user.warn kernel: [ 4228.442643] [<c000836c>] (do_DataAbort+0x0/0xa0) from [<c000e138>] (__dabt_usr+0x38/0x40)
May 28 18:51:35 g2 user.warn kernel: [ 4228.450850] Exception stack(0xc9785fb0 to 0xc9785ff8)
May 28 18:51:35 g2 user.warn kernel: [ 4228.455918] 5fa0: 103ffea8 00000000 b6d56708 00000199
May 28 18:51:35 g2 user.warn kernel: [ 4228.464116] 5fc0: 00000001 b6d557c0 0001ffc8 b6d557f0 103ffea0 b6d55228 10400038 00000064
May 28 18:51:35 g2 user.warn kernel: [ 4228.472327] 5fe0: 0001ffc9 beb04990 00000199 b6c95d84 600f0010 ffffffff
May 28 18:51:35 g2 user.warn kernel: [ 4228.478952] r8:103ffea0 r7:b6d557f0 r6:ffffffff r5:600f0010 r4:b6c95d84
May 28 18:51:35 g2 user.warn kernel: [ 4228.485759] Mem-info:
May 28 18:51:35 g2 user.warn kernel: [ 4228.488038] DMA per-cpu:
May 28 18:51:35 g2 user.warn kernel: [ 4228.490589] CPU 0: hi: 90, btch: 15 usd: 5
May 28 18:51:35 g2 user.warn kernel: [ 4228.495389] CPU 1: hi: 90, btch: 15 usd: 13
May 28 18:51:35 g2 user.warn kernel: [ 4228.500205] CPU 2: hi: 90, btch: 15 usd: 17
May 28 18:51:35 g2 user.warn kernel: [ 4228.505003] CPU 3: hi: 90, btch: 15 usd: 65
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] active_anon:92679 inactive_anon:47 isolated_anon:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] active_file:162 inactive_file:1436 isolated_file:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] unevictable:0 dirty:0 writeback:0 unstable:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] free:28999 slab_reclaimable:841 slab_unreclaimable:2103
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] mapped:343 shmem:89 pagetables:573 bounce:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] free_cma:29019
May 28 18:51:35 g2 user.warn kernel: [ 4228.541416] DMA free:115636kB min:1996kB low:2492kB high:2992kB active_anon:370716kB inactive_anon:188kB active_file:752kB inactive_file:6040kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:2
May 28 18:51:35 g2 user.warn kernel: [ 4228.583833] lowmem_reserve[]: 0 0 0 0
May 28 18:51:35 g2 user.warn kernel: [ 4228.587577] DMA: 2335*4kB (UMC) 1266*8kB (UMC) 1034*16kB (UMC) 835*32kB (UC) 444*64kB (C) 28*128kB (C) 103*256kB (C) 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 121100kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.604979] 502 total pagecache pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.608649] 0 pages in swap cache
May 28 18:51:35 g2 user.warn kernel: [ 4228.611979] Swap cache stats: add 0, delete 0, find 0/0
May 28 18:51:35 g2 user.warn kernel: [ 4228.617210] Free swap = 0kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.620110] Total swap = 0kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.635245] 131072 pages of RAM
May 28 18:51:35 g2 user.warn kernel: [ 4228.638394] 30575 free pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.641293] 3081 reserved pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.644437] 1708 slab pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.647239] 265328 pages shared
May 28 18:51:35 g2 user.warn kernel: [ 4228.650399] 0 pages swap cached
May 28 18:51:35 g2 user.info kernel: [ 4228.653546] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
May 28 18:51:35 g2 user.info kernel: [ 4228.661408] [ 115] 0 115 761 128 5 0 -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.669347] [ 237] 0 237 731 98 5 0 -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.677278] [ 238] 0 238 731 100 5 0 -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.685224] [ 581] 0 581 1134 78 5 0 -1000 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.693074] [ 592] 0 592 662 15 4 0 0 syslogd
May 28 18:51:35 g2 user.info kernel: [ 4228.701184] [ 595] 0 595 662 19 4 0 0 klogd
May 28 18:51:35 g2 user.info kernel: [ 4228.709113] [ 633] 0 633 6413 212 12 0 0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.716877] [ 641] 0 641 663 16 3 0 0 getty
May 28 18:51:35 g2 user.info kernel: [ 4228.724827] [ 642] 0 642 663 16 5 0 0 getty
May 28 18:51:35 g2 user.info kernel: [ 4228.732770] [ 646] 0 646 6413 215 12 0 0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.740540] [ 650] 0 650 10791 572 10 0 0 avbd
May 28 18:51:35 g2 user.info kernel: [ 4228.748385] [ 651] 0 651 9432 2365 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.756322] [ 652] 0 652 52971 4547 42 0 0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.764104] [ 712] 0 712 14135 2458 24 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.772053] [ 746] 0 746 1380 248 6 0 0 dhclient
May 28 18:51:35 g2 user.info kernel: [ 4228.780251] [ 779] 0 779 9419 2383 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.788187] [ 780] 0 780 9350 2348 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.796127] [ 781] 0 781 9349 2347 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.804074] [ 782] 0 782 9353 2354 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.812012] [ 783] 0 783 18807 2573 27 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.819955] [ 784] 0 784 17103 3233 28 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.827882] [ 785] 0 785 13990 2436 24 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.835819] [ 786] 0 786 9349 2350 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.843764] [ 807] 0 807 13255 4125 25 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.851702] [ 1492] 999 1492 512 27 5 0 0 avahi-autoipd
May 28 18:51:35 g2 user.info kernel: [ 4228.860334] [ 1493] 0 1493 433 14 5 0 0 avahi-autoipd
May 28 18:51:35 g2 user.info kernel: [ 4228.868955] [ 1494] 0 1494 1380 246 7 0 0 dhclient
May 28 18:51:35 g2 user.info kernel: [ 4228.877163] [19170] 0 19170 1175 131 6 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.885022] [19183] 0 19183 750 70 4 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.892701] [19228] 0 19228 663 16 5 0 0 watch
May 28 18:51:35 g2 user.info kernel: [ 4228.900636] [19301] 0 19301 1175 131 5 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.908475] [19315] 0 19315 751 69 5 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.916154] [19365] 0 19365 663 16 5 0 0 watch
May 28 18:51:35 g2 user.info kernel: [ 4228.924099] [19443] 0 19443 1175 153 5 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.931948] [19449] 0 19449 750 70 5 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.939626] [19487] 0 19487 1175 132 5 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.947467] [19500] 0 19500 750 70 3 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.955148] [19540] 0 19540 662 17 5 0 0 tail
May 28 18:51:35 g2 user.info kernel: [ 4228.963002] [19687] 0 19687 63719 56396 127 0 0 cored
May 28 18:51:35 g2 user.err kernel: [ 4228.970936] Out of memory: Kill process 19687 (cored) score 428 or sacrifice child
May 28 18:51:35 g2 user.err kernel: [ 4228.978513] Killed process 19687 (cored) total-vm:254876kB, anon-rss:225560kB, file-rss:24kB
这也是我用来对系统施加压力并调用 OOM 杀手的测试程序(echo 1 > /proc/sys/vm/compact_memory
命令经常 运行,OOM 杀手出现在 free
报告系统 RAM 接近于零,正如预期的那样;没有它,OOM-killer 出现在这之前,当 free
报告 130+MB 可用 RAM 但在 cat /proc/buddyinfo
之后显示 RAM 变得支离破碎):
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char ** argv)
{
while(1)
{
printf("PRESS RETURN TO ALLOCATE BUFFERS\n");
const int numBytes = 400;
char buf[64]; fgets(buf, sizeof(buf), stdin);
for (int i=0; i<10000; i++)
{
void * ptr = malloc(numBytes); // yes, a deliberate memory leak
if (ptr)
{
memset(ptr, 'J', numBytes); // force the virtual memory system to actually allocate the RAM, and not only the address space
}
else printf("malloc() failed!\n");
}
fprintf(stderr, "Deliberately leaked 10000*%i bytes!\n", numBytes);
}
return 0;
}
你走在正确的轨道上,杰里米。同样的事情发生在我的 CentOS 桌面系统上。我是一名计算机顾问,自 1995 年以来我一直在 Linux 工作。我用许多文件下载和各种其他活动无情地冲击我的 Linux 系统,使它们达到极限。在我的主桌面运行了大约 4 天后,它变得非常慢(比如比正常速度慢 1/10),OOM killed 开始了,我坐在那里想知道为什么我的系统会这样。它有足够的 RAM,但是 OOM 杀手在它没有业务的时候踢了。所以我重新启动它,它运行良好......大约 4 天,然后问题又来了。不知道为什么把我的鼻涕弄得一团糟。
所以我戴上了测试工程师的帽子,运行 在机器上进行了各种压力测试,看看我是否可以故意重现这些症状。几个月后,我能够随意重现问题,并证明我的解决方案每次都能奏效。
在此上下文中,“缓存周转率”是指系统必须拆除现有缓存以创建更多缓存 space 以支持新文件写入。由于系统急于重新部署 RAM,因此不需要花时间对正在释放的内存进行碎片整理。所以随着时间的推移,随着越来越多的文件写入发生,缓存会反复翻转。而它所在的记忆不断地变得越来越碎片化。在我的测试中,我发现在磁盘缓存翻转了大约 15 次后,内存变得非常碎片化,以至于系统无法拆除然后分配足够快的内存以防止 OOM 杀手因缺少可用 RAM 而被触发当内存需求激增时在系统中。这种尖峰可能是由执行像
这样简单的事情引起的
find /dev /etc /home /opt /tmp /usr -xdev > /dev/null
在我的系统上,该命令需要大约 50MB 的新缓存。原来如此
free -mt
无论如何都会显示。
此问题的解决方案涉及扩展您已经发现的内容。
/bin/echo 3 > /proc/sys/vm/drop_caches
export CONFIG_COMPACTION=1
echo 1 > /proc/sys/vm/compact_memory
是的,我完全同意删除缓存会迫使您的系统re-read 从磁盘中获取一些数据。但是以每天一次甚至每小时一次的速度,与系统正在做的其他事情相比,丢弃缓存的负面影响绝对可以忽略不计,无论那是什么。负面影响是如此之小,我什至无法衡量它,我作为测试工程师工作了 5 年多,弄清楚如何衡量这样的事情。
如果您设置一个 cron 作业来每天执行一次,那应该可以消除您的 OOM 杀手问题。如果在那之后您仍然看到 OOM 杀手的问题,请考虑更频繁地执行它们。它会有所不同,具体取决于您写入的文件量与您设备的系统 RAM 量相比。
我有一个基于 ARM 的无头 Linux (v3.10.53-1.1.1) 系统,没有启用交换 space,我偶尔会看到进程被 OOM 杀死-杀手,即使有足够的 RAM 可用。
运行 echo 1 > /proc/sys/vm/compact_memory
似乎周期性地阻止了 OOM 杀手,这让我认为内存碎片是罪魁祸首,但我不明白为什么用户进程需要无论如何物理上连续的块;据我了解,即使在最坏的情况下(完全碎片化,只有单个 4K 块可用),内核也可以简单地分配必要数量的单个 4K 块,然后使用虚拟内存魔法 (tm) 来制作它们看起来与用户进程相邻。
有人可以解释为什么会调用 OOM-killer 来响应内存碎片吗?它只是一个有问题的内核还是有真正的原因? (即使内核确实需要整理内存碎片以满足请求,它不应该自动执行而不是放弃和 OOM'ing 吗?)
我在下面粘贴了一个 OOM 杀手调用示例,以防它阐明一些事情。我可以随意重现故障;此调用发生时计算机仍有约 120MB 可用 RAM(根据 free
),以响应我的测试程序分配内存,一次分配 10000 个 400 字节。
May 28 18:51:34 g2 user.warn kernel: [ 4228.307769] cored invoked oom-killer: gfp_mask=0x2084d0, order=0, oom_score_adj=0
May 28 18:51:35 g2 user.warn kernel: [ 4228.315295] CPU: 2 PID: 19687 Comm: cored Tainted: G O 3.10.53-1.1.1_ga+gf57416a #1
May 28 18:51:35 g2 user.warn kernel: [ 4228.323843] Backtrace:
May 28 18:51:35 g2 user.warn kernel: [ 4228.326340] [<c0011c54>] (dump_backtrace+0x0/0x10c) from [<c0011e68>] (show_stack+0x18/0x1c)
May 28 18:51:35 g2 user.warn kernel: [ 4228.334804] r6:00000000 r5:00000000 r4:c9784000 r3:00000000
May 28 18:51:35 g2 user.warn kernel: [ 4228.340566] [<c0011e50>] (show_stack+0x0/0x1c) from [<c04d0dd8>] (dump_stack+0x24/0x28)
May 28 18:51:35 g2 user.warn kernel: [ 4228.348684] [<c04d0db4>] (dump_stack+0x0/0x28) from [<c009b474>] (dump_header.isra.10+0x84/0x19c)
May 28 18:51:35 g2 user.warn kernel: [ 4228.357616] [<c009b3f0>] (dump_header.isra.10+0x0/0x19c) from [<c009ba3c>] (oom_kill_process+0x288/0x3f4)
May 28 18:51:35 g2 user.warn kernel: [ 4228.367230] [<c009b7b4>] (oom_kill_process+0x0/0x3f4) from [<c009bf8c>] (out_of_memory+0x208/0x2cc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.376323] [<c009bd84>] (out_of_memory+0x0/0x2cc) from [<c00a0278>] (__alloc_pages_nodemask+0x8f8/0x910)
May 28 18:51:35 g2 user.warn kernel: [ 4228.385921] [<c009f980>] (__alloc_pages_nodemask+0x0/0x910) from [<c00b6c34>] (__pte_alloc+0x2c/0x158)
May 28 18:51:35 g2 user.warn kernel: [ 4228.395263] [<c00b6c08>] (__pte_alloc+0x0/0x158) from [<c00b9fe0>] (handle_mm_fault+0xd4/0xfc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.403914] r6:c981a5d8 r5:cc421a40 r4:10400000 r3:10400000
May 28 18:51:35 g2 user.warn kernel: [ 4228.409689] [<c00b9f0c>] (handle_mm_fault+0x0/0xfc) from [<c0019a00>] (do_page_fault+0x174/0x3dc)
May 28 18:51:35 g2 user.warn kernel: [ 4228.418575] [<c001988c>] (do_page_fault+0x0/0x3dc) from [<c0019dc0>] (do_translation_fault+0xb4/0xb8)
May 28 18:51:35 g2 user.warn kernel: [ 4228.427824] [<c0019d0c>] (do_translation_fault+0x0/0xb8) from [<c00083ac>] (do_DataAbort+0x40/0xa0)
May 28 18:51:35 g2 user.warn kernel: [ 4228.436896] r6:c0019d0c r5:00000805 r4:c06a33d0 r3:103ffea8
May 28 18:51:35 g2 user.warn kernel: [ 4228.442643] [<c000836c>] (do_DataAbort+0x0/0xa0) from [<c000e138>] (__dabt_usr+0x38/0x40)
May 28 18:51:35 g2 user.warn kernel: [ 4228.450850] Exception stack(0xc9785fb0 to 0xc9785ff8)
May 28 18:51:35 g2 user.warn kernel: [ 4228.455918] 5fa0: 103ffea8 00000000 b6d56708 00000199
May 28 18:51:35 g2 user.warn kernel: [ 4228.464116] 5fc0: 00000001 b6d557c0 0001ffc8 b6d557f0 103ffea0 b6d55228 10400038 00000064
May 28 18:51:35 g2 user.warn kernel: [ 4228.472327] 5fe0: 0001ffc9 beb04990 00000199 b6c95d84 600f0010 ffffffff
May 28 18:51:35 g2 user.warn kernel: [ 4228.478952] r8:103ffea0 r7:b6d557f0 r6:ffffffff r5:600f0010 r4:b6c95d84
May 28 18:51:35 g2 user.warn kernel: [ 4228.485759] Mem-info:
May 28 18:51:35 g2 user.warn kernel: [ 4228.488038] DMA per-cpu:
May 28 18:51:35 g2 user.warn kernel: [ 4228.490589] CPU 0: hi: 90, btch: 15 usd: 5
May 28 18:51:35 g2 user.warn kernel: [ 4228.495389] CPU 1: hi: 90, btch: 15 usd: 13
May 28 18:51:35 g2 user.warn kernel: [ 4228.500205] CPU 2: hi: 90, btch: 15 usd: 17
May 28 18:51:35 g2 user.warn kernel: [ 4228.505003] CPU 3: hi: 90, btch: 15 usd: 65
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] active_anon:92679 inactive_anon:47 isolated_anon:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] active_file:162 inactive_file:1436 isolated_file:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] unevictable:0 dirty:0 writeback:0 unstable:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] free:28999 slab_reclaimable:841 slab_unreclaimable:2103
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] mapped:343 shmem:89 pagetables:573 bounce:0
May 28 18:51:35 g2 user.warn kernel: [ 4228.509823] free_cma:29019
May 28 18:51:35 g2 user.warn kernel: [ 4228.541416] DMA free:115636kB min:1996kB low:2492kB high:2992kB active_anon:370716kB inactive_anon:188kB active_file:752kB inactive_file:6040kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:524288kB managed:2
May 28 18:51:35 g2 user.warn kernel: [ 4228.583833] lowmem_reserve[]: 0 0 0 0
May 28 18:51:35 g2 user.warn kernel: [ 4228.587577] DMA: 2335*4kB (UMC) 1266*8kB (UMC) 1034*16kB (UMC) 835*32kB (UC) 444*64kB (C) 28*128kB (C) 103*256kB (C) 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 121100kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.604979] 502 total pagecache pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.608649] 0 pages in swap cache
May 28 18:51:35 g2 user.warn kernel: [ 4228.611979] Swap cache stats: add 0, delete 0, find 0/0
May 28 18:51:35 g2 user.warn kernel: [ 4228.617210] Free swap = 0kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.620110] Total swap = 0kB
May 28 18:51:35 g2 user.warn kernel: [ 4228.635245] 131072 pages of RAM
May 28 18:51:35 g2 user.warn kernel: [ 4228.638394] 30575 free pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.641293] 3081 reserved pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.644437] 1708 slab pages
May 28 18:51:35 g2 user.warn kernel: [ 4228.647239] 265328 pages shared
May 28 18:51:35 g2 user.warn kernel: [ 4228.650399] 0 pages swap cached
May 28 18:51:35 g2 user.info kernel: [ 4228.653546] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
May 28 18:51:35 g2 user.info kernel: [ 4228.661408] [ 115] 0 115 761 128 5 0 -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.669347] [ 237] 0 237 731 98 5 0 -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.677278] [ 238] 0 238 731 100 5 0 -1000 udevd
May 28 18:51:35 g2 user.info kernel: [ 4228.685224] [ 581] 0 581 1134 78 5 0 -1000 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.693074] [ 592] 0 592 662 15 4 0 0 syslogd
May 28 18:51:35 g2 user.info kernel: [ 4228.701184] [ 595] 0 595 662 19 4 0 0 klogd
May 28 18:51:35 g2 user.info kernel: [ 4228.709113] [ 633] 0 633 6413 212 12 0 0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.716877] [ 641] 0 641 663 16 3 0 0 getty
May 28 18:51:35 g2 user.info kernel: [ 4228.724827] [ 642] 0 642 663 16 5 0 0 getty
May 28 18:51:35 g2 user.info kernel: [ 4228.732770] [ 646] 0 646 6413 215 12 0 0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.740540] [ 650] 0 650 10791 572 10 0 0 avbd
May 28 18:51:35 g2 user.info kernel: [ 4228.748385] [ 651] 0 651 9432 2365 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.756322] [ 652] 0 652 52971 4547 42 0 0 g2d
May 28 18:51:35 g2 user.info kernel: [ 4228.764104] [ 712] 0 712 14135 2458 24 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.772053] [ 746] 0 746 1380 248 6 0 0 dhclient
May 28 18:51:35 g2 user.info kernel: [ 4228.780251] [ 779] 0 779 9419 2383 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.788187] [ 780] 0 780 9350 2348 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.796127] [ 781] 0 781 9349 2347 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.804074] [ 782] 0 782 9353 2354 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.812012] [ 783] 0 783 18807 2573 27 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.819955] [ 784] 0 784 17103 3233 28 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.827882] [ 785] 0 785 13990 2436 24 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.835819] [ 786] 0 786 9349 2350 21 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.843764] [ 807] 0 807 13255 4125 25 0 0 cored
May 28 18:51:35 g2 user.info kernel: [ 4228.851702] [ 1492] 999 1492 512 27 5 0 0 avahi-autoipd
May 28 18:51:35 g2 user.info kernel: [ 4228.860334] [ 1493] 0 1493 433 14 5 0 0 avahi-autoipd
May 28 18:51:35 g2 user.info kernel: [ 4228.868955] [ 1494] 0 1494 1380 246 7 0 0 dhclient
May 28 18:51:35 g2 user.info kernel: [ 4228.877163] [19170] 0 19170 1175 131 6 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.885022] [19183] 0 19183 750 70 4 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.892701] [19228] 0 19228 663 16 5 0 0 watch
May 28 18:51:35 g2 user.info kernel: [ 4228.900636] [19301] 0 19301 1175 131 5 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.908475] [19315] 0 19315 751 69 5 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.916154] [19365] 0 19365 663 16 5 0 0 watch
May 28 18:51:35 g2 user.info kernel: [ 4228.924099] [19443] 0 19443 1175 153 5 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.931948] [19449] 0 19449 750 70 5 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.939626] [19487] 0 19487 1175 132 5 0 0 sshd
May 28 18:51:35 g2 user.info kernel: [ 4228.947467] [19500] 0 19500 750 70 3 0 0 sh
May 28 18:51:35 g2 user.info kernel: [ 4228.955148] [19540] 0 19540 662 17 5 0 0 tail
May 28 18:51:35 g2 user.info kernel: [ 4228.963002] [19687] 0 19687 63719 56396 127 0 0 cored
May 28 18:51:35 g2 user.err kernel: [ 4228.970936] Out of memory: Kill process 19687 (cored) score 428 or sacrifice child
May 28 18:51:35 g2 user.err kernel: [ 4228.978513] Killed process 19687 (cored) total-vm:254876kB, anon-rss:225560kB, file-rss:24kB
这也是我用来对系统施加压力并调用 OOM 杀手的测试程序(echo 1 > /proc/sys/vm/compact_memory
命令经常 运行,OOM 杀手出现在 free
报告系统 RAM 接近于零,正如预期的那样;没有它,OOM-killer 出现在这之前,当 free
报告 130+MB 可用 RAM 但在 cat /proc/buddyinfo
之后显示 RAM 变得支离破碎):
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char ** argv)
{
while(1)
{
printf("PRESS RETURN TO ALLOCATE BUFFERS\n");
const int numBytes = 400;
char buf[64]; fgets(buf, sizeof(buf), stdin);
for (int i=0; i<10000; i++)
{
void * ptr = malloc(numBytes); // yes, a deliberate memory leak
if (ptr)
{
memset(ptr, 'J', numBytes); // force the virtual memory system to actually allocate the RAM, and not only the address space
}
else printf("malloc() failed!\n");
}
fprintf(stderr, "Deliberately leaked 10000*%i bytes!\n", numBytes);
}
return 0;
}
你走在正确的轨道上,杰里米。同样的事情发生在我的 CentOS 桌面系统上。我是一名计算机顾问,自 1995 年以来我一直在 Linux 工作。我用许多文件下载和各种其他活动无情地冲击我的 Linux 系统,使它们达到极限。在我的主桌面运行了大约 4 天后,它变得非常慢(比如比正常速度慢 1/10),OOM killed 开始了,我坐在那里想知道为什么我的系统会这样。它有足够的 RAM,但是 OOM 杀手在它没有业务的时候踢了。所以我重新启动它,它运行良好......大约 4 天,然后问题又来了。不知道为什么把我的鼻涕弄得一团糟。
所以我戴上了测试工程师的帽子,运行 在机器上进行了各种压力测试,看看我是否可以故意重现这些症状。几个月后,我能够随意重现问题,并证明我的解决方案每次都能奏效。
在此上下文中,“缓存周转率”是指系统必须拆除现有缓存以创建更多缓存 space 以支持新文件写入。由于系统急于重新部署 RAM,因此不需要花时间对正在释放的内存进行碎片整理。所以随着时间的推移,随着越来越多的文件写入发生,缓存会反复翻转。而它所在的记忆不断地变得越来越碎片化。在我的测试中,我发现在磁盘缓存翻转了大约 15 次后,内存变得非常碎片化,以至于系统无法拆除然后分配足够快的内存以防止 OOM 杀手因缺少可用 RAM 而被触发当内存需求激增时在系统中。这种尖峰可能是由执行像
这样简单的事情引起的find /dev /etc /home /opt /tmp /usr -xdev > /dev/null
在我的系统上,该命令需要大约 50MB 的新缓存。原来如此
free -mt
无论如何都会显示。
此问题的解决方案涉及扩展您已经发现的内容。
/bin/echo 3 > /proc/sys/vm/drop_caches
export CONFIG_COMPACTION=1
echo 1 > /proc/sys/vm/compact_memory
是的,我完全同意删除缓存会迫使您的系统re-read 从磁盘中获取一些数据。但是以每天一次甚至每小时一次的速度,与系统正在做的其他事情相比,丢弃缓存的负面影响绝对可以忽略不计,无论那是什么。负面影响是如此之小,我什至无法衡量它,我作为测试工程师工作了 5 年多,弄清楚如何衡量这样的事情。
如果您设置一个 cron 作业来每天执行一次,那应该可以消除您的 OOM 杀手问题。如果在那之后您仍然看到 OOM 杀手的问题,请考虑更频繁地执行它们。它会有所不同,具体取决于您写入的文件量与您设备的系统 RAM 量相比。