跟踪第 3 方库中的分段错误:cv::ImageCodecInitializer 析构函数崩溃

Tracing a segmentation fault in a 3rd party library: cv::ImageCodecInitializer destructor crashes

我们正在开发一个框架,它直接使用 mrpt-1.9,后者又使用 OpenCV 2.4。 我们正在编写单元测试,当测试存在时(例如,在清理期间)出现段错误并出现 OpenCV 错误:cv::String::deallocate()

我尝试过的:

运行 valgrind

==26159== Conditional jump or move depends on uninitialised value(s)
==26159==    at 0x7DB7F5: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB0: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159== 
==26159== Invalid read of size 4
==26159==    at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159==  Address 0x1a is not stack'd, malloc'd or (recently) free'd
==26159== 
==26159== 
==26159== Process terminating with default action of signal 11 (SIGSEGV)
==26159==  Access not within mapped region at address 0x1A
==26159==    at 0x7DB7FB: cv::String::deallocate() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FB9: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0xAF9FF8: cv::BmpEncoder::~BmpEncoder() (in /home/alex/codez/robot_platform/build/test_slam)
==26159==    by 0x935AF65: cv::ImageCodecInitializer::~ImageCodecInitializer() (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x807A369: __cxa_finalize (cxa_finalize.c:56)
==26159==    by 0x9355B52: ??? (in /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4.9)
==26159==    by 0x4010DE6: _dl_fini (dl-fini.c:235)
==26159==    by 0x8079FF7: __run_exit_handlers (exit.c:82)
==26159==    by 0x807A044: exit (exit.c:104)
==26159==    by 0x8060836: (below main) (libc-start.c:325)
==26159==  If you believe this happened as a result of a stack
==26159==  overflow in your program's main thread (unlikely but
==26159==  possible), you can try to increase the size of the
==26159==  main thread stack using the --main-stacksize= flag.
==26159==  The main thread stack size used in this run was 8388608.
==26159== 
==26159== HEAP SUMMARY:
==26159==     in use at exit: 286,067 bytes in 1,147 blocks
==26159==   total heap usage: 7,469 allocs, 6,322 frees, 1,912,969 bytes allocated
==26159== 
==26159== LEAK SUMMARY:
==26159==    definitely lost: 0 bytes in 0 blocks
==26159==    indirectly lost: 0 bytes in 0 blocks
==26159==      possibly lost: 2,299 bytes in 27 blocks
==26159==    still reachable: 283,768 bytes in 1,120 blocks
==26159==                       of which reachable via heuristic:
==26159==                         newarray           : 1,536 bytes in 16 blocks
==26159==         suppressed: 0 bytes in 0 blocks
==26159== Rerun with --leak-check=full to see details of leaked memory
==26159== 
==26159== For counts of detected and suppressed errors, rerun with: -v
==26159== Use --track-origins=yes to see where uninitialised values come from
==26159== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

据我所知,这可能是我们错误地调用了 MRPT 函数,也可能是 MRPT 本身存在错误。

运行 它与 gdb:

我一直试图在 gdb 中调试它,但我只能得到回溯,但不知道我们代码的哪一部分是负责它的。由于它似乎发生在 main 退出之后,因此确实令人困惑。 更糟糕的是,我们构建的 class(但实际上没有做任何事情)不包含任何 MRPT classes 或对象,所以我猜这是在 MRPT 库中而不是我们的框架中。

Thread 1 "debug" received signal SIGSEGV, Segmentation fault.
0x00000000005b569b in cv::String::deallocate() ()
(gdb) bt
#0  0x00000000005b569b in cv::String::deallocate() ()
#1  0x000000000089969a in cv::BmpEncoder::~BmpEncoder() ()
#2  0x00000000008996d9 in cv::BmpEncoder::~BmpEncoder() [clone .localalias.25] ()
#3  0x00007ffff36a4f66 in cv::ImageCodecInitializer::~ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#4  0x00007ffff484136a in __cxa_finalize (d=0x7ffff38d1000) at cxa_finalize.c:56
#5  0x00007ffff369fb53 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#6  0x00007fffffffd8b0 in ?? ()
#7  0x00007ffff7de7de7 in _dl_fini () at dl-fini.c:235
Backtrace stopped: frame did not save the PC

我在break cv::ImageCodecInitializer::~ImageCodecInitializer

设置了一个断点

我得到了:

Thread 1 "debug" hit Breakpoint 3, 0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
(gdb) bt
#0  0x0000000000888ad0 in cv::ImageCodecInitializer::~ImageCodecInitializer() ()
#1  0x00007ffff4840ff8 in __run_exit_handlers (status=0, listp=0x7ffff4bcb5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#2  0x00007ffff4841045 in __GI_exit (status=<optimised out>) at exit.c:104
#3  0x00007ffff4827837 in __libc_start_main (main=0x5a4536 <main()>, argc=1, argv=0x7fffffffd9d8, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7fffffffd9c8) at ../csu/libc-start.c:325
#4  0x00000000005a4469 in _start ()

搜索 opencv-2.4 调试

该应用程序是使用调试符号构建的,但系统似乎没有带有调试符号的 opencv-2.4,所以我一直收到 优化 警告。

libopencv-apps-dev - opencv_apps Robot OS package - development files
libopencv-apps0d - opencv_apps Robot OS package - runtime files
libopencv-calib3d2.4v5 - computer vision Camera Calibration library
libopencv-contrib-dev - development files for libopencv-contrib
libopencv-contrib2.4v5 - computer vision contrib library
libopencv-core2.4v5 - computer vision core library
libopencv-dev - development files for opencv
libopencv-features2d2.4v5 - computer vision Feature Detection and Descriptor Extraction library
libopencv-flann2.4v5 - computer vision Clustering and Search in Multi-Dimensional spaces library
libopencv-gpu-dev - development files for libopencv-gpu2.4v5
libopencv-gpu2.4v5 - computer vision GPU library
libopencv-highgui2.4v5 - computer vision High-level GUI and Media I/O library
libopencv-imgproc2.4v5 - computer vision Image Processing library
libopencv-legacy-dev - development files for libopencv-legacy
libopencv-legacy2.4v5 - computer vision legacy library
libopencv-ml2.4v5 - computer vision Machine Learning library
libopencv-objdetect2.4v5 - computer vision Object Detection library
libopencv-ocl-dev - development files for libopencv-ocl2.4v5
libopencv-ocl2.4v5 - computer vision OpenCL support library
libopencv-photo2.4v5 - computer vision computational photography library
libopencv-stitching2.4v5 - computer vision image stitching library
libopencv-superres2.4v5 - computer vision Super Resolution library
libopencv-ts2.4v5 - computer vision ts library
libopencv-video2.4v5 - computer vision Video analysis library
libopencv-videostab2.4v5 - computer vision video stabilization library
libopencv2.4-java - Java bindings for the computer vision library
libopencv2.4-jni - Java jni library for the computer vision library

搜索了实际的违规功能点

我查看了我们构建的缩小调试可执行文件以尝试查明问题,然后尝试搜索实际功能:

nm -Ca debug | grep "ImageCodecInitializer"
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000889290 W cv::ImageCodecInitializer::ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()
0000000000888ad0 W cv::ImageCodecInitializer::~ImageCodecInitializer()

然后我试图找出 GDB 对这些地址的看法:

(gdb) info line *0x0000000000889290
No line number information available for address 0x889290 <_ZN2cv21ImageCodecInitializerC2Ev>

但是我不能从那里去任何地方,所以我在 GDB 中搜索以查找谁构造了这个:

#0  0x00007ffff36a6240 in cv::ImageCodecInitializer::ImageCodecInitializer() () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#1  0x00007ffff369f8f6 in ?? () from /usr/lib/x86_64-linux-gnu/libopencv_highgui.so.2.4
#2  0x00007ffff7de76ba in call_init (l=<optimised out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd9d8, env=env@entry=0x7fffffffd9e8) at dl-init.c:72
#3  0x00007ffff7de77cb in call_init (env=0x7fffffffd9e8, argv=0x7fffffffd9d8, argc=1, l=<optimised out>) at dl-init.c:30
#4  _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffd9d8, env=0x7fffffffd9e8) at dl-init.c:120
#5  0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#6  0x0000000000000001 in ?? ()
#7  0x00007fffffffdda0 in ?? ()
#8  0x0000000000000000 in ?? ()

再次优化出来。

搜索了使用违规函数的库

该函数在 libopencv_highgui.so.2.4 中,所以我猜测其中一个 MRPT 库正在使用它,所以我去搜索我们链接的 MRPT 库正在使用它,并找到了它:

readelf -d debug 

Dynamic section at offset 0x2b49bb0 contains 41 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libboost_system.so.1.58.0]
 0x0000000000000001 (NEEDED)             Shared library: [libboost_filesystem.so.1.58.0]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libmrpt-base.so.1.9]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libjpeg.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libpng12.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libtiff.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libjasper.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libIlmImf-2_2.so.22]
 0x0000000000000001 (NEEDED)             Shared library: [libHalf.so.12]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

所以,我发现:

sudo ldconfig -p | grep "libmrpt-base.so.1.9"
        libmrpt-base.so.1.9 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9

然后:

readelf -d /usr/lib/x86_64-linux-gnu/libmrpt-base.so.1.9

Dynamic section at offset 0xa5aea8 contains 37 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libcxsparse.so.3.1.4]
 0x0000000000000001 (NEEDED)             Shared library: [libwx_baseu-3.0.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libwx_gtk2u_core-3.0.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libjpeg.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_highgui.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [libmrpt-base.so.1.9]

我知道这是造成问题的库,因为在我们的项目中我们使用 opencv-3.3 静态链接它。 遗憾的是,我们使用的存储库也没有 MRPT 的调试符号:

libmrpt-base1.9 - Mobile Robot Programming Toolkit - base library
libmrpt-detectors1.9 - Mobile Robot Programming Toolkit - detectors library
libmrpt-graphs1.9 - Mobile Robot Programming Toolkit - graphs library
libmrpt-graphslam1.9 - Mobile Robot Programming Toolkit - graphslam library
libmrpt-gui1.9 - Mobile Robot Programming Toolkit - gui library
libmrpt-hmtslam1.9 - Mobile Robot Programming Toolkit - hmtslam library
libmrpt-hwdrivers1.9 - Mobile Robot Programming Toolkit - hwdrivers library
libmrpt-kinematics1.9 - Mobile Robot Programming Toolkit - kinematics library
libmrpt-maps1.9 - Mobile Robot Programming Toolkit - maps library
libmrpt-nav1.9 - Mobile Robot Programming Toolkit - nav library
libmrpt-obs1.9 - Mobile Robot Programming Toolkit - obs library
libmrpt-opengl1.9 - Mobile Robot Programming Toolkit - opengl library
libmrpt-slam1.9 - Mobile Robot Programming Toolkit - slam library
libmrpt-tfest1.9 - Mobile Robot Programming Toolkit - tfest library
libmrpt-topography1.9 - Mobile Robot Programming Toolkit - topography library
libmrpt-vision1.9 - Mobile Robot Programming Toolkit - vision library
libmrpt-comms1.9 - Mobile Robot Programming Toolkit - comms library

更糟糕的是:

nm -C libmrpt-base.so
nm: libmrpt-base.so: no symbols

这就是旅程的终点​​。

我有哪些选择?

非常感谢任何帮助、提示或提示。 如果这个问题太本地化,不符合SO标准,欢迎留言,我会更新。

我的第一个猜测是,您可能会因为同时使用两个 opencv 版本而遇到此问题... 尝试从源代码构建 mrpt,告诉 CMake 使用与主项目相同的 opencv 版本。

mrpt-base 不直接使用 highgui 中的任何东西(尽管...它链接到它!这应该是固定的,四确定),所以我怀疑这个错误与 opencv 中静态变量的初始化有关模块和链接器有问题...

干杯

不是真正的答案,但注释不利于格式化代码。 github上最新的opencv有如下来源

void cv::String::deallocate()
{
    int* data = (int*)cstr_;
    len_ = 0;
    cstr_ = 0;

    if(data && 1 == CV_XADD(data-1, -1))
    {
        cv::fastFree(data-1);
    }
}

(可能比您的版本更新)。

看起来这是将字符串存储为前 4 个字节中的引用计数,后跟以 nul 结尾的字符串。 if 条件检查指针是否为 NULL,然后看起来它正在对引用计数进行原子递减,并在计数下降到 1 时释放内存。