只有在同时使用 AVX 和链接到其他代码时,我才会遇到分段错误
I am hitting a segmentation fault only when both using AVX and linking to other code that does
我正在使用 Eigen 建立一个稀疏线性系统如下(略微伪代码):
Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>> solver;
Eigen::SparseMatrix<real_t> P(rows, cols);
P.setFromTriplets(triplet_list.begin(), triplet_list.end());
P.makeCompressed();
solver.compute(P);
此代码在一个小型图书馆中。我正在使用 -mavx -mfma -O2
进行编译。如果我使用这个库构建一个简单的可执行文件,一切 运行 都很好。如果我改为 link 进入另一个库(其中 C++ 源代码是使用相同的编译器标志构建的,但也包括 CUDA),我会在 Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>>::factorize
中遇到分段错误。如果我用 -O0
编译,分段错误就会消失。
我无法将其分离为最小工作示例;我将不胜感激关于如何更好地描述问题的建议或关于可能出现问题的想法。虽然矢量化对于这个解决方案并不重要,但我在库的其他地方确实需要它,所以简单地删除 AVX 标志不是一个好的选择。
编辑:根据要求添加一些上下文。
如果我在 gdb 中使用 -g 和 运行 编译,确切的崩溃行是 Core/util/Memory.h
中的第 98 行
│95 /** \internal Frees memory allocated with handmade_aligned_malloc */ │
│96 inline void handmade_aligned_free(void *ptr) │
│97 { │
>│98 if (ptr) std::free(*(reinterpret_cast<void**>(ptr) - 1)); │
│99 }
有堆栈跟踪
#0 0x00007ffff12e94dc in free () from /lib64/libc.so.6
#1 0x00007fffe3dadb1f in Eigen::internal::handmade_aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:98
#2 Eigen::internal::aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:179
#3 Eigen::aligned_allocator<float>::deallocate (this=<optimized out>, p=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:763
#4 std::allocator_traits<Eigen::aligned_allocator<float> >::deallocate (__a=..., __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/alloc_traits.h:328
#5 std::_Vector_base<float, Eigen::aligned_allocator<float> >::_M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/stl_vector.h:180
#6 std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append (this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>, __n=<optimized out>) at include/c++/7.3.0/bits/vector.tcc:592
#7 0x00007fffe3dae688 in std::vector<float, Eigen::aligned_allocator<float> >::resize (__new_size=10, this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>) at include/c++/7.3.0/bits/stl_vector.h:692
如果我 运行 使用 valgrind,我会看到以下表格的错误。但是,程序不再崩溃(valgrind 之外的相同代码 运行 仍然会出现段错误)。
==16218== Invalid read of size 8
==16218== at 0x19049B16: handmade_aligned_free (Memory.h:98)
==16218== by 0x19049B16: aligned_free (Memory.h:179)
==16218== by 0x19049B16: deallocate (Memory.h:763)
==16218== by 0x19049B16: deallocate (alloc_traits.h:328)
==16218== by 0x19049B16: _M_deallocate (stl_vector.h:180)
==16218== by 0x19049B16: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Address 0x3e195558 is 8 bytes before a block of size 8 alloc'd
==16218== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==16218== by 0x123B7326: Eigen::internal::aligned_malloc(unsigned long) (in /gdn/centos7/0001/x3/prefixes/desmond-dependencies/2.14c7__dc4688ce01c7/lib/libminimax.so)
==16218== by 0x19049B73: allocate (Memory.h:758)
==16218== by 0x19049B73: allocate (alloc_traits.h:301)
==16218== by 0x19049B73: _M_allocate (stl_vector.h:172)
==16218== by 0x19049B73: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:571)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid free() / delete / delete[] / realloc()
==16218== at 0x4C2ACDD: free (vg_replace_malloc.c:530)
==16218== by 0x19049B1E: handmade_aligned_free (Memory.h:98)
==16218== by 0x19049B1E: aligned_free (Memory.h:179)
==16218== by 0x19049B1E: deallocate (Memory.h:763)
==16218== by 0x19049B1E: deallocate (alloc_traits.h:328)
==16218== by 0x19049B1E: _M_deallocate (stl_vector.h:180)
==16218== by 0x19049B1E: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid read of size 8
==16218== at 0x1905327B: handmade_aligned_free (Memory.h:98)
==16218== by 0x1905327B: aligned_free (Memory.h:179)
==16218== by 0x1905327B: conditional_aligned_free<true> (Memory.h:230)
==16218== by 0x1905327B: conditional_aligned_delete_auto<double, true> (Memory.h:416)
==16218== by 0x1905327B: ~DenseStorage (DenseStorage.h:542)
==16218== by 0x1905327B: ~PlainObjectBase (PlainObjectBase.h:98)
==16218== by 0x1905327B: ~Matrix (Matrix.h:178)
==16218== by 0x1905327B: Eigen::SparseQR<Eigen::SparseMatrix<double, 0, int>, Eigen::COLAMDOrdering<int> >::factorize(Eigen::SparseMatrix<double, 0, int> const&) (SparseQR.h:360)
==16218== by 0x19047A28: compute (SparseQR.h:118)
我正在尝试将其变成一个最小的可重现示例。
如果将具有不同内存对齐选项的编译单元链接在一起,通常会出现所描述的问题。默认情况下,Eigen 将内存对齐到 16 字节,除非启用了 AVX,在这种情况下,内存对齐到 32 字节(我认为 AVX512 是 64 字节)。
理想情况下,您应该编译具有相同目标架构的所有编译单元——如果您只打算 运行 在本地机器上最好使用 -march=native
(这也可以针对本地架构进行调整).
如果您需要在启用 AVX 的情况下编译某些部分,而其他部分则不启用,您可以使用 -DEIGEN_MAX_ALIGN_BYTES=16
或 -DEIGEN_MAX_ALIGN_BYTES=32
手动覆盖 Eigen 的内存对齐(为了保持一致性,其中一个应该是添加到所有编译单元,即使有些是多余的)。
我正在使用 Eigen 建立一个稀疏线性系统如下(略微伪代码):
Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>> solver;
Eigen::SparseMatrix<real_t> P(rows, cols);
P.setFromTriplets(triplet_list.begin(), triplet_list.end());
P.makeCompressed();
solver.compute(P);
此代码在一个小型图书馆中。我正在使用 -mavx -mfma -O2
进行编译。如果我使用这个库构建一个简单的可执行文件,一切 运行 都很好。如果我改为 link 进入另一个库(其中 C++ 源代码是使用相同的编译器标志构建的,但也包括 CUDA),我会在 Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>>::factorize
中遇到分段错误。如果我用 -O0
编译,分段错误就会消失。
我无法将其分离为最小工作示例;我将不胜感激关于如何更好地描述问题的建议或关于可能出现问题的想法。虽然矢量化对于这个解决方案并不重要,但我在库的其他地方确实需要它,所以简单地删除 AVX 标志不是一个好的选择。
编辑:根据要求添加一些上下文。
如果我在 gdb 中使用 -g 和 运行 编译,确切的崩溃行是 Core/util/Memory.h
中的第 98 行 │95 /** \internal Frees memory allocated with handmade_aligned_malloc */ │
│96 inline void handmade_aligned_free(void *ptr) │
│97 { │
>│98 if (ptr) std::free(*(reinterpret_cast<void**>(ptr) - 1)); │
│99 }
有堆栈跟踪
#0 0x00007ffff12e94dc in free () from /lib64/libc.so.6
#1 0x00007fffe3dadb1f in Eigen::internal::handmade_aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:98
#2 Eigen::internal::aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:179
#3 Eigen::aligned_allocator<float>::deallocate (this=<optimized out>, p=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:763
#4 std::allocator_traits<Eigen::aligned_allocator<float> >::deallocate (__a=..., __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/alloc_traits.h:328
#5 std::_Vector_base<float, Eigen::aligned_allocator<float> >::_M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/stl_vector.h:180
#6 std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append (this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>, __n=<optimized out>) at include/c++/7.3.0/bits/vector.tcc:592
#7 0x00007fffe3dae688 in std::vector<float, Eigen::aligned_allocator<float> >::resize (__new_size=10, this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>) at include/c++/7.3.0/bits/stl_vector.h:692
如果我 运行 使用 valgrind,我会看到以下表格的错误。但是,程序不再崩溃(valgrind 之外的相同代码 运行 仍然会出现段错误)。
==16218== Invalid read of size 8
==16218== at 0x19049B16: handmade_aligned_free (Memory.h:98)
==16218== by 0x19049B16: aligned_free (Memory.h:179)
==16218== by 0x19049B16: deallocate (Memory.h:763)
==16218== by 0x19049B16: deallocate (alloc_traits.h:328)
==16218== by 0x19049B16: _M_deallocate (stl_vector.h:180)
==16218== by 0x19049B16: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Address 0x3e195558 is 8 bytes before a block of size 8 alloc'd
==16218== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==16218== by 0x123B7326: Eigen::internal::aligned_malloc(unsigned long) (in /gdn/centos7/0001/x3/prefixes/desmond-dependencies/2.14c7__dc4688ce01c7/lib/libminimax.so)
==16218== by 0x19049B73: allocate (Memory.h:758)
==16218== by 0x19049B73: allocate (alloc_traits.h:301)
==16218== by 0x19049B73: _M_allocate (stl_vector.h:172)
==16218== by 0x19049B73: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:571)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid free() / delete / delete[] / realloc()
==16218== at 0x4C2ACDD: free (vg_replace_malloc.c:530)
==16218== by 0x19049B1E: handmade_aligned_free (Memory.h:98)
==16218== by 0x19049B1E: aligned_free (Memory.h:179)
==16218== by 0x19049B1E: deallocate (Memory.h:763)
==16218== by 0x19049B1E: deallocate (alloc_traits.h:328)
==16218== by 0x19049B1E: _M_deallocate (stl_vector.h:180)
==16218== by 0x19049B1E: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid read of size 8
==16218== at 0x1905327B: handmade_aligned_free (Memory.h:98)
==16218== by 0x1905327B: aligned_free (Memory.h:179)
==16218== by 0x1905327B: conditional_aligned_free<true> (Memory.h:230)
==16218== by 0x1905327B: conditional_aligned_delete_auto<double, true> (Memory.h:416)
==16218== by 0x1905327B: ~DenseStorage (DenseStorage.h:542)
==16218== by 0x1905327B: ~PlainObjectBase (PlainObjectBase.h:98)
==16218== by 0x1905327B: ~Matrix (Matrix.h:178)
==16218== by 0x1905327B: Eigen::SparseQR<Eigen::SparseMatrix<double, 0, int>, Eigen::COLAMDOrdering<int> >::factorize(Eigen::SparseMatrix<double, 0, int> const&) (SparseQR.h:360)
==16218== by 0x19047A28: compute (SparseQR.h:118)
我正在尝试将其变成一个最小的可重现示例。
如果将具有不同内存对齐选项的编译单元链接在一起,通常会出现所描述的问题。默认情况下,Eigen 将内存对齐到 16 字节,除非启用了 AVX,在这种情况下,内存对齐到 32 字节(我认为 AVX512 是 64 字节)。
理想情况下,您应该编译具有相同目标架构的所有编译单元——如果您只打算 运行 在本地机器上最好使用 -march=native
(这也可以针对本地架构进行调整).
如果您需要在启用 AVX 的情况下编译某些部分,而其他部分则不启用,您可以使用 -DEIGEN_MAX_ALIGN_BYTES=16
或 -DEIGEN_MAX_ALIGN_BYTES=32
手动覆盖 Eigen 的内存对齐(为了保持一致性,其中一个应该是添加到所有编译单元,即使有些是多余的)。