GCC 8.3 + Linux 上 boost::thread_specific_ptr 的 Valgrind 错误

Valgrind errors with boost::thread_specific_ptr on GCC 8.3 + Linux

当应用程序关闭时,Valgrind 报告这 3 个问题:

==70== Mismatched free() / delete / delete []
==70==    at 0x483997B: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x4870C89: check_free (dlerror.c:202)
==70==    by 0x4870C89: check_free (dlerror.c:186)
==70==    by 0x4870C89: free_key_mem (dlerror.c:221)
==70==    by 0x4870C89: __dlerror_main_freeres (dlerror.c:239)
==70==    by 0x4B59711: __libc_freeres (in /usr/lib/x86_64-linux-gnu/libc-2.29.so)
==70==    by 0x482E19E: _vgnU_freeres (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so)
==70==    by 0x4A0A3A9: __run_exit_handlers (exit.c:132)
==70==    by 0x4A0A3D9: exit (exit.c:139)
==70==    by 0x49E9B71: (below main) (libc-start.c:342)
==70==  Address 0x4f6a570 is 0 bytes inside a block of size 312 alloc'd
==70==    at 0x4838DBF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x303D6D: boost::detail::make_external_thread_data() (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x305424: boost::detail::add_new_tss_node(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x3054ED: boost::detail::set_tss_data(void const*, 

[...]

==70== Invalid free() / delete / delete[] / realloc()
==70==    at 0x483997B: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x4870BB4: free_key_mem (dlerror.c:223)
==70==    by 0x4870BB4: __dlerror_main_freeres (dlerror.c:239)
==70==    by 0x4B59711: __libc_freeres (in /usr/lib/x86_64-linux-gnu/libc-2.29.so)
==70==    by 0x482E19E: _vgnU_freeres (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so)
==70==    by 0x4A0A3A9: __run_exit_handlers (exit.c:132)
==70==    by 0x4A0A3D9: exit (exit.c:139)
==70==    by 0x49E9B71: (below main) (libc-start.c:342)
==70==  Address 0x4f6a570 is 0 bytes inside a block of size 312 free'd
==70==    at 0x483997B: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x4870C89: check_free (dlerror.c:202)
==70==    by 0x4870C89: check_free (dlerror.c:186)
==70==    by 0x4870C89: free_key_mem (dlerror.c:221)
==70==    by 0x4870C89: __dlerror_main_freeres (dlerror.c:239)
==70==    by 0x4B59711: __libc_freeres (in /usr/lib/x86_64-linux-gnu/libc-2.29.so)
==70==    by 0x482E19E: _vgnU_freeres (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so)
==70==    by 0x4A0A3A9: __run_exit_handlers (exit.c:132)
==70==    by 0x4A0A3D9: exit (exit.c:139)
==70==    by 0x49E9B71: (below main) (libc-start.c:342)
==70==  Block was alloc'd at
==70==    at 0x4838DBF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x303D6D: boost::detail::make_external_thread_data() (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x305424: boost::detail::add_new_tss_node(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x3054ED: boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x188841: boost::thread_specific_ptr<burningmime::setmatch::MatchState>::reset(burningmime::setmatch::MatchState*) (tss.hpp:105)

[...]

==70== 24 bytes in 1 blocks are definitely lost in loss record 1 of 2
==70==    at 0x4838DBF: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==70==    by 0x303F50: boost::detail::make_external_thread_data() (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x305424: boost::detail::add_new_tss_node(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*) (in /build-context/bin/debug/setmatch-tests)
==70==    by 0x3054ED: boost::detail::set_tss_data(void const*, boost::shared_ptr<boost::detail::tss_cleanup_function>, void*, bool) (in /build-context/bin/debug/setmatch-tests)

[...]

看起来 boost 在 dlerror 分配它自己的线程数据的同一个地方分配它的线程数据。 A quick search points to a (slightly different?) version of dlerror here

快速浏览一下 bosot 的代码,我觉得它只是 allocating the TSS block on the heap

这不是 GCC 7.3.0 + Ubuntu 18(相同的 Boost 版本)

的问题

有人对此有一些见解吗?

编辑:Maybe it's the double-free that was fixed in this commit? Still I don't see why Boost would be using that at all.

请检查您使用的所有工具的版本。这似乎存在一些版本兼容性问题。尝试使用 3.15.0 版本的 valgrind。

valgrind 的用法见here

也许你应该将 valgrind 版本升级到 3.15.0,它应该会有帮助。

我觉得here应该能帮到你。

如果我像这样修改 pthread_setspecific 调用周围的 glibc upstream test case(并用 g++ 编译它):

    void *ptr = new char;
    printf("Setting thread local to ptr.\n");
    if (pthread_setspecific(key, ptr) != 0) {
      perror("pthread_setspecific");
      exit(1);
    }
    delete ptr;

当 运行在修复之前针对 glibc 进行 运行 时出现此错误(在提交 5b06f538c5aee0389ed034f60d90a8884d6d54de,使用 glibc 构建树中的 ./testrun.sh --tool=valgrind /path/to/test):

==14143== Invalid read of size 8
==14143==    at 0x483B550: check_free (dlerror.c:188)
==14143==    by 0x483BA21: free_key_mem (dlerror.c:221)
==14143==    by 0x483BA21: __dlerror_main_freeres (dlerror.c:239)
==14143==    by 0x4D06AD1: __libc_freeres (in /home/fweimer/src/gnu/glibc/build/libc.so)
==14143==    by 0x48031DE: _vgnU_freeres (vg_preloaded.c:77)
==14143==    by 0x4BDD331: __run_exit_handlers (exit.c:132)
==14143==    by 0x4BDD3C9: exit (exit.c:139)
==14143==    by 0x4BC7E21: (below main) (libc-start.c:342)
==14143==  Address 0x4d750d8 is 23 bytes after a block of size 1 free'd
==14143==    at 0x480CEFC: operator delete(void*) (vg_replace_malloc.c:586)
==14143==    by 0x401344: main (t.c:93)
==14143==  Block was alloc'd at
==14143==    at 0x480BE86: operator new(unsigned long) (vg_replace_malloc.c:344)
==14143==    by 0x4012F4: main (t.c:87)
==14143== 
==14143== Invalid free() / delete / delete[] / realloc()
==14143==    at 0x480CA0C: free (vg_replace_malloc.c:540)
==14143==    by 0x483BA29: free_key_mem (dlerror.c:223)
==14143==    by 0x483BA29: __dlerror_main_freeres (dlerror.c:239)
==14143==    by 0x4D06AD1: __libc_freeres (in /home/fweimer/src/gnu/glibc/build/libc.so)
==14143==    by 0x48031DE: _vgnU_freeres (vg_preloaded.c:77)
==14143==    by 0x4BDD331: __run_exit_handlers (exit.c:132)
==14143==    by 0x4BDD3C9: exit (exit.c:139)
==14143==    by 0x4BC7E21: (below main) (libc-start.c:342)
==14143==  Address 0x4d750c0 is 0 bytes inside a block of size 1 free'd
==14143==    at 0x480CEFC: operator delete(void*) (vg_replace_malloc.c:586)
==14143==    by 0x401344: main (t.c:93)
==14143==  Block was alloc'd at
==14143==    at 0x480BE86: operator new(unsigned long) (vg_replace_malloc.c:344)
==14143==    by 0x4012F4: main (t.c:87)

除了 Boost 中 operator new 分配的嵌套之外,这与您得到的错误几乎相同。所以看起来这两个错误确实是一样的。

这是有道理的:由于 bug 24476libdl 使用未初始化的 pthread_key_t 值(之前没有调用 pthread_key_create )。对于数据段(其中 libdl 的内部密钥存储0,未初始化意味着零,当然,正如您从测试中的诊断输出中看到的那样,测试分配的密钥(在您的情况下是Boost)实际上是键 0:

key = 0

这个 libdl 代码相当复杂,我 posted a patch which moves dlerror into libc (来自 libdl)并且还避免使用 POSIX 线程线程本地存储。

总结一下:维护你使用的 glibc 版本的人需要向后移植 upstream fix into their source tree and release an update. We had to do this as well. 从好的方面来说,这个错误只发生在你 运行 你的应用程序在 valgrind 和类似工具下,因为在常规期间process shutdown, __libc_freeres 没有被调用:反正进程很快就会退出,内核会帮我们清理掉所有的资源。除非你在生产中使用 valgrind,否则这意味着你永远不会在那里遇到这个错误。当然,当你使用 valgrind 进行调试时,这仍然是一个恼人的问题。抱歉。