内存不足:AWS Lightsail 上的终止进程 (gunicorn)

Out of memory: Killed process (gunicorn) on AWS Lightsail

我希望有人可以指导我如何确定是什么原因导致内存不足继续发生。我是这个领域的新手,所以任何帮助将不胜感激。

我有一个使用 Gunicorn、Ngnix、PostgreSQL 的 Django 应用程序。我也在使用 Supervisor 来监控应用程序。如果我重新启动服务器,它会自动重新启动应用程序……没问题。该应用程序是在此之前使用 Flask 构建的,我从未遇到过这个问题。这两个应用程序都有以下 AWS:

AWS Lightsail 512 MB 内存 1 个核心处理器 20 GB 固态硬盘 1 TB 传输*

以下是 gunicorn 错误日志中的几行:

[2022-01-20 02:06:15 +0000] [723] [INFO] Booting worker with pid: 723
[2022-01-20 02:06:15 +0000] [724] [INFO] Booting worker with pid: 724
[2022-01-20 02:06:15 +0000] [725] [INFO] Booting worker with pid: 725
[2022-01-20 07:43:42 +0000] [708] [CRITICAL] WORKER TIMEOUT (pid:723)
[2022-01-20 07:49:11 +0000] [708] [CRITICAL] WORKER TIMEOUT (pid:724)
[2022-01-20 07:49:11 +0000] [708] [CRITICAL] WORKER TIMEOUT (pid:725)
[2022-01-20 02:49:11 -0500] [724] [INFO] Worker exiting (pid: 724)
[2022-01-20 02:49:11 -0500] [725] [INFO] Worker exiting (pid: 725)
[2022-01-20 07:49:11 +0000] [708] [WARNING] Worker with pid 723 was terminated due to signal 9
[2022-01-20 07:49:11 +0000] [708] [WARNING] Worker with pid 724 was terminated due to signal 9

以下是来自 /var/log/kern.log 的行:

Jan 20 02:10:00 <ip> kernel: [  239.894714] raid6: using avx2x2 recovery algorithm
Jan 20 02:10:00 <ip> kernel: [  239.907815] xor: automatically using best checksumming function   avx       
Jan 20 02:10:00 <ip> kernel: [  239.967979] Btrfs loaded, crc32c=crc32c-intel, zoned=yes
Jan 20 07:49:11 <ip> kernel: [20591.283487] apport invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jan 20 07:49:11 <ip> kernel: [20591.283495] CPU: 0 PID: 12911 Comm: apport Not tainted 5.11.0-1025-aws #27~20.04.1-Ubuntu
Jan 20 07:49:11 <ip> kernel: [20591.283498] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
Jan 20 07:49:11 <ip> kernel: [20591.283500] Call Trace:
Jan 20 07:49:11 <ip> kernel: [20591.283503]  dump_stack+0x74/0x92
Jan 20 07:49:11 <ip> kernel: [20591.283511]  dump_header+0x4f/0x1f6
Jan 20 07:49:11 <ip> kernel: [20591.283514]  oom_kill_process.cold+0xb/0x10
Jan 20 07:49:11 <ip> kernel: [20591.283517]  out_of_memory.part.0+0x1ee/0x460
Jan 20 07:49:11 <ip> kernel: [20591.283521]  out_of_memory+0x6d/0xd0
Jan 20 07:49:11 <ip> kernel: [20591.283523]  __alloc_pages_slowpath.constprop.0+0xc4d/0xd20
Jan 20 07:49:11 <ip> kernel: [20591.283529]  __alloc_pages_nodemask+0x2a0/0x300
Jan 20 07:49:11 <ip> kernel: [20591.283531]  alloc_pages_current+0x87/0xe0
Jan 20 07:49:11 <ip> kernel: [20591.283536]  __page_cache_alloc+0x89/0xb0
Jan 20 07:49:11 <ip> kernel: [20591.283539]  pagecache_get_page+0xf1/0x350
Jan 20 07:49:11 <ip> kernel: [20591.283542]  filemap_fault+0x9f3/0xfc0
Jan 20 07:49:11 <ip> kernel: [20591.283543]  ? __mod_lruvec_state+0x3a/0x50
Jan 20 07:49:11 <ip> kernel: [20591.283548]  ? __unlock_page_memcg+0x25/0x60
Jan 20 07:49:11 <ip> kernel: [20591.283550]  ? unlock_page_memcg+0x24/0x30
Jan 20 07:49:11 <ip> kernel: [20591.283551]  ? page_add_file_rmap+0x122/0x160
Jan 20 07:49:11 <ip> kernel: [20591.283555]  ? filemap_map_pages+0x218/0x3f0
Jan 20 07:49:11 <ip> kernel: [20591.283558]  ext4_filemap_fault+0x32/0x50
Jan 20 07:49:11 <ip> kernel: [20591.283562]  __do_fault+0x3c/0xe0
Jan 20 07:49:11 <ip> kernel: [20591.283567]  do_fault+0x276/0x4f0
Jan 20 07:49:11 <ip> kernel: [20591.283570]  __handle_mm_fault+0x677/0x920
Jan 20 07:49:11 <ip> kernel: [20591.283573]  handle_mm_fault+0xd7/0x2b0
Jan 20 07:49:11 <ip> kernel: [20591.283577]  do_user_addr_fault+0x1a0/0x450
Jan 20 07:49:11 <ip> kernel: [20591.283581]  exc_page_fault+0x69/0x150
Jan 20 07:49:11 <ip> kernel: [20591.283586]  ? asm_exc_page_fault+0x8/0x30
Jan 20 07:49:11 <ip> kernel: [20591.283590]  asm_exc_page_fault+0x1e/0x30
Jan 20 07:49:11 <ip> kernel: [20591.283593] RIP: 0033:0x499b19
Jan 20 07:49:11 <ip> kernel: [20591.283600] Code: Unable to access opcode bytes at RIP 0x499aef.
Jan 20 07:49:11 <ip> kernel: [20591.283600] RSP: 002b:00007ffe760932b0 EFLAGS: 00010246
Jan 20 07:49:11 <ip> kernel: [20591.283603] RAX: 00007ffe760932e0 RBX: 00007ffe760a8400 RCX: 0000000000000000
Jan 20 07:49:11 <ip> kernel: [20591.283604] RDX: 000000000000002c RSI: 000000000083c8f0 RDI: 00007ffe76093300
Jan 20 07:49:11 <ip> kernel: [20591.283606] RBP: 00007ffe760a84f0 R08: 000000000000002c R09: 0000000000000074
Jan 20 07:49:11 <ip> kernel: [20591.283607] R10: 0000000000000010 R11: 0000000000000000 R12: 00007ffe760a4370
Jan 20 07:49:11 <ip> kernel: [20591.283608] R13: 00007ffe76094330 R14: 00007ffe76098340 R15: 0000000000000000
Jan 20 07:49:11 <ip> kernel: [20591.283611] Mem-Info:
Jan 20 07:49:11 <ip> kernel: [20591.283612] active_anon:1457 inactive_anon:85298 isolated_anon:0
Jan 20 07:49:11 <ip> kernel: [20591.283612]  active_file:9 inactive_file:268 isolated_file:0
Jan 20 07:49:11 <ip> kernel: [20591.283612]  unevictable:5774 dirty:0 writeback:0
Jan 20 07:49:11 <ip> kernel: [20591.283612]  slab_reclaimable:5923 slab_unreclaimable:9346
Jan 20 07:49:11 <ip> kernel: [20591.283612]  mapped:5078 shmem:3503 pagetables:1325 bounce:0
Jan 20 07:49:11 <ip> kernel: [20591.283612]  free:1110 free_pcp:88 free_cma:0
Jan 20 07:49:11 <ip> kernel: [20591.283616] Node 0 active_anon:5828kB inactive_anon:341192kB active_file:36kB inactive_file:1072kB unevictable:23096kB isolated(anon):0kB isolated(file):0kB mapped:20312kB dirty:0kB writeback:0kB shmem:14012kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2928kB pagetables:5300kB all_unreclaimable? yes
Jan 20 07:49:11 <ip> kernel: [20591.283621] Node 0 DMA free:1840kB min:92kB low:112kB high:132kB reserved_highatomic:0KB active_anon:24kB inactive_anon:12728kB active_file:4kB inactive_file:284kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 20 07:49:11 <ip> kernel: [20591.283625] lowmem_reserve[]: 0 437 437 437 437
Jan 20 07:49:11 <ip> kernel: [20591.283629] Node 0 DMA32 free:2600kB min:2624kB low:3280kB high:3936kB reserved_highatomic:0KB active_anon:5804kB inactive_anon:328464kB active_file:36kB inactive_file:788kB unevictable:23096kB writepending:0kB present:507904kB managed:460968kB mlocked:18560kB bounce:0kB free_pcp:352kB local_pcp:352kB free_cma:0kB
Jan 20 07:49:11 <ip> kernel: [20591.283634] lowmem_reserve[]: 0 0 0 0 0
Jan 20 07:49:11 <ip> kernel: [20591.283637] Node 0 DMA: 20*4kB (M) 16*8kB (UM) 10*16kB (M) 4*32kB (UM) 1*64kB (U) 2*128kB (M) 4*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1840kB
Jan 20 07:49:11 <ip> kernel: [20591.283650] Node 0 DMA32: 16*4kB (UME) 5*8kB (UE) 94*16kB (UE) 31*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2600kB
Jan 20 07:49:11 <ip> kernel: [20591.283662] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 20 07:49:11 <ip> kernel: [20591.283663] 6963 total pagecache pages
Jan 20 07:49:11 <ip> kernel: [20591.283664] 0 pages in swap cache
Jan 20 07:49:11 <ip> kernel: [20591.283665] Swap cache stats: add 0, delete 0, find 0/0
Jan 20 07:49:11 <ip> kernel: [20591.283666] Free swap  = 0kB
Jan 20 07:49:11 <ip> kernel: [20591.283667] Total swap = 0kB
Jan 20 07:49:11 <ip> kernel: [20591.283668] 130973 pages RAM
Jan 20 07:49:11 <ip> kernel: [20591.283669] 0 pages HighMem/MovableOnly
Jan 20 07:49:11 <ip> kernel: [20591.283669] 11755 pages reserved
Jan 20 07:49:11 <ip> kernel: [20591.283670] 0 pages hwpoisoned
Jan 20 07:49:11 <ip> kernel: [20591.283671] Tasks state (memory values in pages):
Jan 20 07:49:11 <ip> kernel: [20591.283672] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jan 20 07:49:11 <ip> kernel: [20591.283674] [    164]     0   164    15383     1019   122880        0          -250 systemd-journal
Jan 20 07:49:11 <ip> kernel: [20591.283677] [    182]     0   182      624      145    45056        0             0 bpfilter_umh
Jan 20 07:49:11 <ip> kernel: [20591.283680] [    213]     0   213     4807      831    65536        0         -1000 systemd-udevd
Jan 20 07:49:11 <ip> kernel: [20591.283682] [    325]     0   325    70052     4499    98304        0         -1000 multipathd
Jan 20 07:49:11 <ip> kernel: [20591.283684] [    385]   102   385    22560      811    77824        0             0 systemd-timesyn
Jan 20 07:49:11 <ip> kernel: [20591.283687] [    456]   100   456     6686      942    77824        0             0 systemd-network
Jan 20 07:49:11 <ip> kernel: [20591.283688] [    458]   101   458     5977     1781    86016        0             0 systemd-resolve
Jan 20 07:49:11 <ip> kernel: [20591.283690] [    492]     0   492    59351      728    94208        0             0 accounts-daemon
Jan 20 07:49:11 <ip> kernel: [20591.283692] [    493]     0   493      637      169    45056        0             0 acpid
Jan 20 07:49:11 <ip> kernel: [20591.283694] [    497]     0   497     2136      568    53248        0             0 cron
Jan 20 07:49:11 <ip> kernel: [20591.283696] [    500]   103   500     1880      862    53248        0          -900 dbus-daemon
Jan 20 07:49:11 <ip> kernel: [20591.283698] [    509]     0   509     7322     2728    94208        0             0 networkd-dispat
Jan 20 07:49:11 <ip> kernel: [20591.283700] [    512]   104   512    56127      423    81920        0             0 rsyslogd
Jan 20 07:49:11 <ip> kernel: [20591.283702] [    513]     0   513   308790     1818   188416        0             0 amazon-ssm-agen
Jan 20 07:49:11 <ip> kernel: [20591.283704] [    516]     0   516   157291     3266   204800        0          -900 snapd
Jan 20 07:49:11 <ip> kernel: [20591.283706] [    517]     0   517     7868     4312    98304        0             0 supervisord
Jan 20 07:49:11 <ip> kernel: [20591.283708] [    519]     0   519     4210      867    69632        0             0 systemd-logind
Jan 20 07:49:11 <ip> kernel: [20591.283709] [    521]     0   521    98109      924   126976        0             0 udisksd
Jan 20 07:49:11 <ip> kernel: [20591.283711] [    522]     0   522      950      512    49152        0             0 atd
Jan 20 07:49:11 <ip> kernel: [20591.283713] [    536]     0   536     1840      441    49152        0             0 agetty
Jan 20 07:49:11 <ip> kernel: [20591.283715] [    546]     0   546     1459      382    49152        0             0 agetty
Jan 20 07:49:11 <ip> kernel: [20591.283717] [    567]     0   567    58181      184    81920        0             0 polkitd
Jan 20 07:49:11 <ip> kernel: [20591.283719] [    612]     0   612    27029     2702   106496        0             0 unattended-upgr
Jan 20 07:49:11 <ip> kernel: [20591.283721] [    637]   113   637    54361     3996   159744        0          -900 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283723] [    685]   113   685    54391     1299   151552        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283725] [    686]   113   686    54361     1176   147456        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283727] [    687]   113   687    54361     1860   139264        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283728] [    688]   113   688    54496     1157   151552        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283730] [    689]   113   689    17986      805   118784        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283732] [    690]   113   690    54468     1039   147456        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283734] [    698]     0   698     3046      809    61440        0         -1000 sshd
Jan 20 07:49:11 <ip> kernel: [20591.283736] [    708]  1001   708     7436     3950    86016        0             0 gunicorn
Jan 20 07:49:11 <ip> kernel: [20591.283738] [    723]  1001   723    46321    17951   356352        0             0 gunicorn
Jan 20 07:49:11 <ip> kernel: [20591.283739] [    724]  1001   724    46082    17628   356352        0             0 gunicorn
Jan 20 07:49:11 <ip> kernel: [20591.283741] [    725]  1001   725    46156    17734   356352        0             0 gunicorn
Jan 20 07:49:11 <ip> kernel: [20591.283743] [    863]     0   863   311945     3483   212992        0             0 ssm-agent-worke
Jan 20 07:49:11 <ip> kernel: [20591.283745] [  11283]     0 11283    16381      389    94208        0             0 nginx
Jan 20 07:49:11 <ip> kernel: [20591.283747] [  11284]    33 11284    16548      945   106496        0             0 nginx
Jan 20 07:49:11 <ip> kernel: [20591.283749] [  12619]     0 12619      654      121    45056        0             0 apt.systemd.dai
Jan 20 07:49:11 <ip> kernel: [20591.283751] [  12623]     0 12623      654      439    45056        0             0 apt.systemd.dai
Jan 20 07:49:11 <ip> kernel: [20591.283752] [  12713]     0 12713    61155     7388   241664        0             0 unattended-upgr
Jan 20 07:49:11 <ip> kernel: [20591.283755] [  12899]   113 12899    54628     1826   155648        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283756] [  12901]   113 12901    54560     1399   155648        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283759] [  12902]     0 12902     2968      353    53248        0             0 sshd
Jan 20 07:49:11 <ip> kernel: [20591.283760] [  12903]     0 12903     2968      350    57344        0             0 sshd
Jan 20 07:49:11 <ip> kernel: [20591.283763] [  12904]   113 12904    54495     1378   151552        0             0 postgres
Jan 20 07:49:11 <ip> kernel: [20591.283764] [  12906]     0 12906     2968      315    61440        0             0 sshd
Jan 20 07:49:11 <ip> kernel: [20591.283766] [  12907]     0 12907     2556      644    53248        0             0 cron
Jan 20 07:49:11 <ip> kernel: [20591.283768] [  12909]     0 12909      654       67    40960        0             0 sh
Jan 20 07:49:11 <ip> kernel: [20591.283770] [  12910]     0 12910      654      398    40960        0             0 debian-sa1
Jan 20 07:49:11 <ip> kernel: [20591.283772] [  12911]     0 12911     3254      547    61440        0             0 apport
Jan 20 07:49:11 <ip> kernel: [20591.283774] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/supervisor.service,task=gunicorn,pid=723,uid=1001
Jan 20 07:49:11 <ip> kernel: [20591.283785] Out of memory: Killed process 723 (gunicorn) total-vm:185284kB, anon-rss:69128kB, file-rss:2676kB, shmem-rss:0kB, UID:1001 pgtables:348kB oom_score_adj:0
Jan 20 07:49:11 <ip> kernel: [20591.302446] oom_reaper: reaped process 723 (gunicorn), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Jan 20 07:49:15 <ip> kernel: [20595.886463] loop10: detected capacity change from 0 to 8

这些步骤可能会帮助您解决这个问题

  1. 减少 gunicorn 工人的数量
  2. 一般推荐(2 x $num_cores) + 1作为开始的worker数量
  3. 并将 RAM 从 512 MB 增加到至少 2GB(或 1 GB)