有人可以用 open mpi 解释这个 valgrind 输出吗?

Can someone explain this valgrind output with open mpi?

我有一个使用 OpenMPI 的应用程序并在 Windows 和 Linux 上启动它。 Windows 的版本工作正常,但是,Linux 上的 运行 会导致内存分配错误。某些需要更多计算的应用程序参数会出现此问题。 为了消除内存泄漏,我使用 Valgrind 检查了 Linux 版本的应用程序并得到了一些 output. After all, I tried to search information about the output and found some posts on stack overflow and GitHub(not enough reputation to attach links). After all, I updated openMPI to 2.0.2 and check app again. New output。是 OpenMPI 内存泄漏还是我做错了什么?

一条输出:

==16210== 4 bytes in 1 blocks are definitely lost in loss record 5 of 327
==16210==    at 0x4C2DBB6: malloc (vg_replace_malloc.c:299)
==16210==    by 0x5657A59: strdup (strdup.c:42)
==16210==    by 0x51128E6: opal_basename (in /home/vshmelev/OMPI_2.0.2/lib/libopen-pal.so.20.2.0)
==16210==    by 0x7DDECA9: ???
==16210==    by 0x7DDEDD4: ???
==16210==    by 0x6FBFF84: ???
==16210==    by 0x4E4EA9E: orte_init (in /home/vshmelev/OMPI_2.0.2/lib/libopen-rte.so.20.1.0)
==16210==    by 0x4041FD: orterun (orterun.c:818)
==16210==    by 0x4034E5: main (main.c:13)

OpenMPI version:Open MPI: 2.0.2
Valgrind 版本:valgrind-3.12.0
虚拟机特性:Ubuntu 16.04 LTS x64

如果使用 MPICH,Valgrind 输出为:

==87863== HEAP SUMMARY:
==87863==     in use at exit: 131,120 bytes in 2 blocks
==87863==   total heap usage: 2,577 allocs, 2,575 frees, 279,908 bytes allocated
==87863== 
==87863== 131,120 bytes in 2 blocks are still reachable in loss record 1 of 1
==87863==    at 0x4C2DBB6: malloc (vg_replace_malloc.c:299)
==87863==    by 0x425803: alloc_fwd_hash (sock.c:332)
==87863==    by 0x425803: HYDU_sock_forward_stdio (sock.c:376)
==87863==    by 0x432A99: HYDT_bscu_stdio_cb (bscu_cb.c:19)
==87863==    by 0x42D9BF: HYDT_dmxu_poll_wait_for_event (demux_poll.c:75)
==87863==    by 0x42889F: HYDT_bscu_wait_for_completion (bscu_wait.c:60)
==87863==    by 0x42863C: HYDT_bsci_wait_for_completion (bsci_wait.c:21)
==87863==    by 0x40B123: HYD_pmci_wait_for_completion (pmiserv_pmci.c:217)
==87863==    by 0x4035C5: main (mpiexec.c:343)
==87863== 
==87863== LEAK SUMMARY:
==87863==    definitely lost: 0 bytes in 0 blocks
==87863==    indirectly lost: 0 bytes in 0 blocks
==87863==      possibly lost: 0 bytes in 0 blocks
==87863==    still reachable: 131,120 bytes in 2 blocks
==87863==         suppressed: 0 bytes in 0 blocks
==87863== 
==87863== For counts of detected and suppressed errors, rerun with: -v
==87863== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 

术语 "definitely lost" 表示第 13 行程序的主要功能(据我在输出中看到的)直接泄漏内存或调用其他导致内存泄漏的函数 (orterun)。您必须修复这些漏洞或提供更多代码。

先看看here再说吧

这些输出指向 MPI 库中的某些内存泄漏,而不是您的应用程序代码。您可以放心地忽略它们。

更具体地说,这些泄漏来自发射器。 ORTE is the runtime environment for OpenMPI responsible for launching and managing MPI processes. Hydra 是 MPICH 的启动器和进程管理器。