Valgind：仅当内存肯定丢失时才使用非零退出代码

Question

我们使用 valgrind 作为我们 CI 流程的一部分。如果有一些内存问题，valgrind 必须 return 非零代码，并报告此事件。我们就是这样运行的：

valgrind --error-exitcode=1 --tool=memcheck --leak-check=full \
    --errors-for-leak-kinds=definite --show-leak-kinds=definite \
    --track-origins=yes ./some_cgo_application

(...)

==25182== HEAP SUMMARY:
==25182==     in use at exit: 2,416,970 bytes in 34,296 blocks
==25182==   total heap usage: 83,979 allocs, 49,684 frees, 5,168,335 bytes allocated
==25182== 
==25182== LEAK SUMMARY:
==25182==    definitely lost: 0 bytes in 0 blocks
==25182==    indirectly lost: 0 bytes in 0 blocks
==25182==      possibly lost: 3,024 bytes in 7 blocks
==25182==    still reachable: 2,413,946 bytes in 34,289 blocks
==25182==                       of which reachable via heuristic:
==25182==                         newarray           : 520 bytes in 1 blocks
==25182==         suppressed: 0 bytes in 0 blocks
==25182== Reachable blocks (those to which a pointer was found) are not shown.
==25182== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==25182== 
==25182== For counts of detected and suppressed errors, rerun with: -v
==25182== ERROR SUMMARY: 20 errors from 5 contexts (suppressed: 0 from 0)

目前我们只对 definitly lost 的内存感兴趣。如果没有块确实丢失，则 valgrind 的退出代码应为零。然而，它 returns 1 尽管选项 --errors-for-leak-kinds=definite --show-leak-kinds=definite.

echo $?
1

是否还有其他有助于实现预期结果的选项？

Answer 1

我怀疑退出状态1来自程序本身。我可以重现这个：

$ valgrind --error-exitcode=1 --tool=memcheck --leak-check=full \
  --errors-for-leak-kinds=definite --show-leak-kinds=definite \
  --track-origins=yes /bin/false

这看起来不像是可以在当前来源中更改的内容：

   case VgSrc_ExitProcess: /* the normal way out (Darwin) */
      /* Change the application return code to user's return code,
         if an error was found */
      if (VG_(clo_error_exitcode) > 0 
          && VG_(get_n_errs_found)() > 0) {
         VG_(client_exit)( VG_(clo_error_exitcode) );
      } else {
         /* otherwise, return the client's exit code, in the normal
            way. */
         VG_(client_exit)( VG_(threads)[tid].os_state.exitcode );
      }

并且这个 exitcode 成员是从 coregrind/m_syswrap/syswrap-linux.c 中的 sys_exit_group 包装器设置的，没有任何调整它的方法。

考虑到这一点，我认为你最好的选择（不修补 valgrind）是 select 一个不同于你的程序可能使用的任何退出状态的退出状态，并将其用作 valgrind 错误的指示器。

Answer 2

查看 --gen-suppressions 选项。有了这个，您可以创建一个文件，告诉 valgrind 抑制特定于调用堆栈的可能丢失但仍可访问的错误。使用此文件，使用 --supressions=filename 重新运行 valgrind。现在 valgrind ($?) 的 return 值将为零。这是一个例子：

valgrind --gen-suppressions=auto --log-file=suppressions.supp ./path/to/program
<open suppressions.supp, delete all lines that are not suppressions, save, and close>
valgrind --suppressions=suppressions.supp ./path/to/program
echo $?

您应该会看到零打印出来。

这不会在未来验证您的代码是否存在新的可能丢失但仍可访问的错误，但它会消除当前错误。如果您想在未来验证您的代码，您可以编写一个脚本来解析 --gen-suppressions=auto 的输出并生成一个新的抑制文件。可以制作脚本，以便它只对可能丢失但仍可访问的错误添加抑制，因此您仍然会看到您关心的错误。

Valgind：仅当内存肯定丢失时才使用非零退出代码

Valgind: non-zero exit code only if memory is definitely lost

valgrind

memory-leaks

exit-code