How to solve UPC Runtime error: out of shared memory
How to solve UPC Runtime error: out of shared memory
我正在尝试在具有 64
个内核和 256 GB
RAM 的计算机上 运行 Berkeley UPC
代码。但是代码无法 运行 因为它找不到足够的内存。以下应该有效,因为 51 x 5 = 255 GB < 256 GB
upcrun -n 51 -shared-heap=5GB xcorupc_sac inputpgas_sac{$rc1}.txt
..
UPCR: UPC thread 3 of 51 on range (pshm node 0 of 1, process 3 of 51, pid=191914)
UPCR: UPC thread 16 of 51 on range (pshm node 0 of 1, process 16 of 51, pid=191927)
UPC Runtime warning: Requested shared memory (5120 MB) > available (2515 MB) on node 0 (range): using 2515 MB per thread instead
UPC Runtime error: out of shared memory
Local shared memory in use: 1594 MB per-thread, 81340 MB total
Global shared memory in use: 0 MB per-thread, 1 MB total
Total shared memory limit: 2515 MB per-thread, 128281 MB total
upc_alloc unable to service request from thread 0 for 1672245248 more bytes
NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace.
NOTICE: We recommend linking the debug version of GASNet to assist you in resolving this application issue.
我不明白为什么 Total shared memory limit
是 128 GB
,它是现有总物理内存的一半。即使使用 shared-heap
标志我也无法覆盖它,我明确要求每个线程 5 GB
。有什么建议吗?
cat /proc/meminfo
MemTotal: 263378836 kB
UPC 构建是使用标志 --with-sptr-packed-bits=20,9,35
编译的,每个线程最多允许 2^35 = 32 GB 的共享内存。
EDIT1:以下是命令 upcc --version
的输出
[avinash@range jointinvsurf5_cajoint_compile]$ upcc --version
This is upcc (the Berkeley Unified Parallel C compiler), v. 2019.4.4
(getting remote translator settings...)
----------------------+---------------------------------------------------------
UPC Runtime | v. 2019.4.4, built on Feb 11 2020 at 23:31:40
----------------------+---------------------------------------------------------
UPC-to-C translator | v. 2.28.0, built on Jul 19 2018 at 20:29:47
| host aphid linux-x86_64/64
| gcc v4.2.4 (Ubuntu 4.2.4-1ubuntu4)
----------------------+---------------------------------------------------------
Translator location | http://upc-translator.lbl.gov/upcc-2019.4.0.cgi
----------------------+---------------------------------------------------------
networks supported | smp udp mpi ibv
----------------------+---------------------------------------------------------
default network | ibv
----------------------+---------------------------------------------------------
pthreads support | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
Configured with | '--with-translator=http://upc-translator.lbl.gov/upcc-2
| 019.4.0.cgi' '--with-sptr-packed-bits=20,9,35'
| '--prefix=/usr/local/berkeley_upc/opt'
| '--with-multiconf-magic=opt'
----------------------+---------------------------------------------------------
Configure features | trans_bupc,pragma_upc_code,driver_upcc,runtime_upcr,
| gasnet,upc_collective,upc_io,upc_memcpy_async,
| upc_memcpy_vis,upc_ptradd,upc_thread_distance,upc_tick,
| upc_sem,upc_dump_shared,upc_trace_printf,
| upc_trace_mask,upc_local_to_shared,upc_all_free,
| upc_atomics,pupc,upc_types,upc_castable,upc_nb,nodebug,
| notrace,nostats,nodebugmalloc,nogasp,nothrille,
| segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
| packedsptr,upc_io_64
----------------------+---------------------------------------------------------
Configure id | range Tue Feb 11 23:18:39 PST 2020 gnome-initial-setup
----------------------+---------------------------------------------------------
Binary interface | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
Runtime interface # | Runtime supports 3.0 -> 3.13: Translator uses 3.6
----------------------+---------------------------------------------------------
| --- BACKEND SETTINGS (for ibv network) ---
----------------------+---------------------------------------------------------
C compiler | /usr/bin/gcc
| GNU/4.8.5/4.8.5 20150623 (Red Hat 4.8.5-39)
| gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) Copyright
| (C) 2015 Free Software Foundation, Inc.
----------------------+---------------------------------------------------------
C compiler flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Wno-unused
| -Wunused-result -Wno-unused-parameter -Wno-address
| -std=gnu99
----------------------+---------------------------------------------------------
linker | /data/seismo82/avinash/Programs/openmpiinstall/bin/mpic
| c
| GNU/4.8.5/4.8.5 20150623 (Red Hat 4.8.5-39)
| gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) Copyright
| (C) 2015 Free Software Foundation, Inc.
----------------------+---------------------------------------------------------
linker flags | -D_GNU_SOURCE=1 -O3 --param
| max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Wno-unused
| -Wunused-result -Wno-unused-parameter -Wno-address
| -std=gnu99 -L/data/seismo82/avinash/Programs/myupc/opt
| -L/data/seismo82/avinash/Programs/myupc/opt/umalloc
| -lupcr-ibv-seq -lumalloc
| -L/data/seismo82/avinash/Programs/myupc/opt/gasnet/ibv-
| conduit -lgasnet-ibv-seq -libverbs -lpthread -lrt
| -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lgcc -lm
----------------------+---------------------------------------------------------
EDIT2:以下是 df -h /dev/shm
命令的输出
[avinash@range jointinvsurf5_cajoint_compile]$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 126G 21M 126G 1% /dev/shm
默认情况下,Berkeley UPC 使用内核共享内存服务在并置进程之间交叉映射 UPC 共享段。对于smp-conduit,这是唯一的操作模式。
假设这是一个具有配置默认值的 Linux 系统,最可能的解释是内核提供的 POSIX 共享内存 space 耗尽。您可以通过查看它所在的虚拟文件系统来确认这一点。下面是配置为最多 20G 共享内存的系统示例:
$df -h /dev/shm /var/shm /run/shm
df: '/var/shm': No such file or directory
df: '/run/shm': No such file or directory
Filesystem Size Used Avail Use% Mounted on
tmpfs 20G 504K 20G 1% /dev/shm
这个值限制了每个节点共享内存段的总数space。这个限制通常可以通过管理员调整内核设置来提高,尽管细节因发行版而异。
有关详细信息,请参阅 https://gasnet.lbl.gov/dist-ex/README
中的 'System Settings for POSIX Shared Memory' 部分
最后,请注意,即使解决了上述问题,在具有 256 GB 物理 DRAM (99.6%) 的系统上要求 255 GB 的共享内存堆可能也是不可取的。这为应用程序内存的非共享部分(堆栈、静态数据、malloc 堆)以及内核和守护进程的内存开销留下了很少的 space。根据您的内核设置,这可能会触发内存不足恐慌以开始终止进程。我们通常建议安全的经验法则限制为 85% 的物理内存(假设系统空闲),"proceed with caution" 超出此范围。
我正在尝试在具有 64
个内核和 256 GB
RAM 的计算机上 运行 Berkeley UPC
代码。但是代码无法 运行 因为它找不到足够的内存。以下应该有效,因为 51 x 5 = 255 GB < 256 GB
upcrun -n 51 -shared-heap=5GB xcorupc_sac inputpgas_sac{$rc1}.txt
..
UPCR: UPC thread 3 of 51 on range (pshm node 0 of 1, process 3 of 51, pid=191914)
UPCR: UPC thread 16 of 51 on range (pshm node 0 of 1, process 16 of 51, pid=191927)
UPC Runtime warning: Requested shared memory (5120 MB) > available (2515 MB) on node 0 (range): using 2515 MB per thread instead
UPC Runtime error: out of shared memory
Local shared memory in use: 1594 MB per-thread, 81340 MB total
Global shared memory in use: 0 MB per-thread, 1 MB total
Total shared memory limit: 2515 MB per-thread, 128281 MB total
upc_alloc unable to service request from thread 0 for 1672245248 more bytes
NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace.
NOTICE: We recommend linking the debug version of GASNet to assist you in resolving this application issue.
我不明白为什么 Total shared memory limit
是 128 GB
,它是现有总物理内存的一半。即使使用 shared-heap
标志我也无法覆盖它,我明确要求每个线程 5 GB
。有什么建议吗?
cat /proc/meminfo
MemTotal: 263378836 kB
UPC 构建是使用标志 --with-sptr-packed-bits=20,9,35
编译的,每个线程最多允许 2^35 = 32 GB 的共享内存。
EDIT1:以下是命令 upcc --version
[avinash@range jointinvsurf5_cajoint_compile]$ upcc --version
This is upcc (the Berkeley Unified Parallel C compiler), v. 2019.4.4
(getting remote translator settings...)
----------------------+---------------------------------------------------------
UPC Runtime | v. 2019.4.4, built on Feb 11 2020 at 23:31:40
----------------------+---------------------------------------------------------
UPC-to-C translator | v. 2.28.0, built on Jul 19 2018 at 20:29:47
| host aphid linux-x86_64/64
| gcc v4.2.4 (Ubuntu 4.2.4-1ubuntu4)
----------------------+---------------------------------------------------------
Translator location | http://upc-translator.lbl.gov/upcc-2019.4.0.cgi
----------------------+---------------------------------------------------------
networks supported | smp udp mpi ibv
----------------------+---------------------------------------------------------
default network | ibv
----------------------+---------------------------------------------------------
pthreads support | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
Configured with | '--with-translator=http://upc-translator.lbl.gov/upcc-2
| 019.4.0.cgi' '--with-sptr-packed-bits=20,9,35'
| '--prefix=/usr/local/berkeley_upc/opt'
| '--with-multiconf-magic=opt'
----------------------+---------------------------------------------------------
Configure features | trans_bupc,pragma_upc_code,driver_upcc,runtime_upcr,
| gasnet,upc_collective,upc_io,upc_memcpy_async,
| upc_memcpy_vis,upc_ptradd,upc_thread_distance,upc_tick,
| upc_sem,upc_dump_shared,upc_trace_printf,
| upc_trace_mask,upc_local_to_shared,upc_all_free,
| upc_atomics,pupc,upc_types,upc_castable,upc_nb,nodebug,
| notrace,nostats,nodebugmalloc,nogasp,nothrille,
| segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
| packedsptr,upc_io_64
----------------------+---------------------------------------------------------
Configure id | range Tue Feb 11 23:18:39 PST 2020 gnome-initial-setup
----------------------+---------------------------------------------------------
Binary interface | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
Runtime interface # | Runtime supports 3.0 -> 3.13: Translator uses 3.6
----------------------+---------------------------------------------------------
| --- BACKEND SETTINGS (for ibv network) ---
----------------------+---------------------------------------------------------
C compiler | /usr/bin/gcc
| GNU/4.8.5/4.8.5 20150623 (Red Hat 4.8.5-39)
| gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) Copyright
| (C) 2015 Free Software Foundation, Inc.
----------------------+---------------------------------------------------------
C compiler flags | -O3 --param max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Wno-unused
| -Wunused-result -Wno-unused-parameter -Wno-address
| -std=gnu99
----------------------+---------------------------------------------------------
linker | /data/seismo82/avinash/Programs/openmpiinstall/bin/mpic
| c
| GNU/4.8.5/4.8.5 20150623 (Red Hat 4.8.5-39)
| gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) Copyright
| (C) 2015 Free Software Foundation, Inc.
----------------------+---------------------------------------------------------
linker flags | -D_GNU_SOURCE=1 -O3 --param
| max-inline-insns-single=35000 --param
| inline-unit-growth=10000 --param
| large-function-growth=200000 -Wno-unused
| -Wunused-result -Wno-unused-parameter -Wno-address
| -std=gnu99 -L/data/seismo82/avinash/Programs/myupc/opt
| -L/data/seismo82/avinash/Programs/myupc/opt/umalloc
| -lupcr-ibv-seq -lumalloc
| -L/data/seismo82/avinash/Programs/myupc/opt/gasnet/ibv-
| conduit -lgasnet-ibv-seq -libverbs -lpthread -lrt
| -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lgcc -lm
----------------------+---------------------------------------------------------
EDIT2:以下是 df -h /dev/shm
命令的输出
[avinash@range jointinvsurf5_cajoint_compile]$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 126G 21M 126G 1% /dev/shm
默认情况下,Berkeley UPC 使用内核共享内存服务在并置进程之间交叉映射 UPC 共享段。对于smp-conduit,这是唯一的操作模式。
假设这是一个具有配置默认值的 Linux 系统,最可能的解释是内核提供的 POSIX 共享内存 space 耗尽。您可以通过查看它所在的虚拟文件系统来确认这一点。下面是配置为最多 20G 共享内存的系统示例:
$df -h /dev/shm /var/shm /run/shm
df: '/var/shm': No such file or directory
df: '/run/shm': No such file or directory
Filesystem Size Used Avail Use% Mounted on
tmpfs 20G 504K 20G 1% /dev/shm
这个值限制了每个节点共享内存段的总数space。这个限制通常可以通过管理员调整内核设置来提高,尽管细节因发行版而异。
有关详细信息,请参阅 https://gasnet.lbl.gov/dist-ex/README
中的 'System Settings for POSIX Shared Memory' 部分最后,请注意,即使解决了上述问题,在具有 256 GB 物理 DRAM (99.6%) 的系统上要求 255 GB 的共享内存堆可能也是不可取的。这为应用程序内存的非共享部分(堆栈、静态数据、malloc 堆)以及内核和守护进程的内存开销留下了很少的 space。根据您的内核设置,这可能会触发内存不足恐慌以开始终止进程。我们通常建议安全的经验法则限制为 85% 的物理内存(假设系统空闲),"proceed with caution" 超出此范围。