C++/调试(AIX 上的 g++)递归快速排序导致分段错误

C++/Debugging (g++ on AIX) Recursive Quicksort Causing Segmentation Faults

我有一个程序需要对大量大数值分布进行排序。为了减少执行此操作所需的时间,我正在尝试对其进行多线程处理。

我写了一个小而简单的程序抽象来尝试隔离问题。我相信我遇到了堆栈溢出,或者达到了操作系统的堆栈限制,因为我的测试程序在以下情况下反映了分段错误问题:

#include <boost/thread/thread.hpp>
#include <vector>
#include <stdlib.h> // for rand()

void swapvals(double *distribution, const size_t &d1, const size_t &d2)
{
    double temp = 0;
    temp = distribution[d2];
    distribution[d2] = distribution[d1];
    distribution[d1] = temp;
    //std::swap(distribution[d1], distribution[d2]);

}

size_t partition(double *distribution,  size_t left, size_t right)
{
        const double pivot = distribution[right];

        while (left < right) {

                while ((left < right) && distribution[left] <= pivot)
                        left++;

                while ((left < right) && distribution[right] > pivot)
                        right--;

                if (left < right)
                {
                        swapvals(distribution, left, right);
                }
        }
        return right;
}

void quickSort(double *distribution, const size_t left, const size_t right)
{
        if (left >= right) {
                return;
        }
        size_t part = partition(distribution, left, right);
        quickSort(distribution, left, part - 1);
        quickSort(distribution, part + 1, right);
}
void processDistribution(double *distributions, const size_t distribution_size)
{

       std::clog << "beginning qsorting." << std::endl;
       quickSort(distributions, 0, distribution_size - 1);
       std::clog << "done qsorting." << std::endl;

}

int main(int argc, char* argv[])
{
    size_t distribution_size = 65000;
    size_t num_distributions = 10;

    std::vector<double *> distributions;

    // Create num_distributions distributions.
    for (int i = 0; i < num_distributions; i++)
    {
        double * new_dist = new double[distribution_size];
        for (int k = 0; k < distribution_size; k++)
        {
            // Works when I have actual numbers in the distributions.
            // Seg faults when all the numbers are the same.
            new_dist[k] =1;
            //new_dist[k] = rand() % 1000 + 1; // uncomment this, and it works.
        }

        distributions.push_back(new_dist);
    }

    // Submit each distribution to a quicksort thread.
    boost::thread_group threads;
    for (std::vector<double *>::const_iterator it=distributions.begin(); it != distributions.end(); ++it)
    {
         // It works when I run processDistribution directly. Segfaults when I run it via threads.
         //processDistribution(*it, distribution_size);
         threads.create_thread(boost::bind(&processDistribution, *it, distribution_size)); 
    }
    threads.join_all();

    // Show the results of the sort for all the distributions.
    for (std::vector<double *>::const_iterator it=distributions.begin(); it != distributions.end(); ++it)
    {
        for (size_t i = 0; i < distribution_size; i++)
        {
            // print first and last 20 results.
            if (i < 20 || i > (distribution_size - 20))
                std::cout << (*it)[i] << ",";
        }
        std::cout << std::endl;
    }

}

核心文件的 GDB 分析产量:

Error in re-setting breakpoint -1: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Error in re-setting breakpoint -1: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Error in re-setting breakpoint -2: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Error in re-setting breakpoint -3: aix-thread: ptrace (52, 18220265) returned -1 (errno = 3 The process does not exist.)
Core was generated by `testthreads'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000001000056bc in partition (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:18

warning: Source file is more recent than executable.
18
(gdb) bt 7
#0  0x00000001000056bc in partition (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:18
#1  0x0000000100005834 in quickSort (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:42
#2  0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63627) at testthreads.cpp:43
#3  0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63628) at testthreads.cpp:43
#4  0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63629) at testthreads.cpp:43
#5  0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63630) at testthreads.cpp:43
#6  0x0000000100005850 in quickSort (distribution=0x1101d1430, left=0, right=63631) at testthreads.cpp:43
(More stack frames follow...)
(gdb) frame 0
#0  0x00000001000056bc in partition (distribution=0x1101d1430, left=0, right=63626) at testthreads.cpp:18
18
(gdb) info locals
pivot = 1
(gdb) info args
distribution = 0x1101d1430
left = 0
right = 63626
(gdb)

此外,我的实际程序处理更多的线程和分布。那里的 GDB 检查经常显示更奇怪的堆栈跟踪,看起来像内存损坏(注意 swapVals 是如何用 d1 = 12119 调用的,但在分区堆栈框架内它是 4568618016):

(gdb) bt 3
#0  0x00000001002aa0b8 in ScenRankReplacer<double>::swapvals (this=0xfffffffffffdfc8, distribution=..., d1=@0x1104c8178: 4568618016, d2=@0x1104c8140: 4568416720, ranking_values=0x1104c81d0,
    r1=@0x1104c8170: 1152921504606838728, r2=@0x1002a16c8: 6917529029728344952) at ScenRankReplacer.h:96
#1  0x00000001002a7120 in ScenRankReplacer<double>::partition (this=0xfffffffffffdfc8, distribution=..., ranking_values=0x11069ae50, left=1, right=24237) at ScenRankReplacer.h:122
#2  0x00000001002a16c8 in ScenRankReplacer<double>::quickSort (this=0xfffffffffffdfc8, distribution=..., ranking_values=0x11069ae50, left=1, right=24237) at ScenRankReplacer.h:91
(More stack frames follow...)
(gdb) frame 1
#1  0x00000001002a7120 in ScenRankReplacer<double>::partition (this=0xfffffffffffdfc8, distribution=..., ranking_values=0x11069ae50, left=1, right=24237) at ScenRankReplacer.h:122
122             swapvals(distribution, mid, left, ranking_values, mid - 1, left - 1);
(gdb) p mid
 = 12119
(gdb) p left
 = 1

所以...我的问题:

  1. 我说的对吗?我达到了堆栈限制吗?
  2. 我到底如何确定是这种情况(除了我上面所做的推论)?有没有简单的方法来检测这些? GDB 线索之类的?
  3. 为什么线程很重要?所有的线程共享相同的吗 堆栈限制?
  4. 最重要的是:我如何让它工作?!是一个 对海量数据集进行递归快速排序不可行?

错误发生在编译级别O2。 线程模型:aix gcc 版本 4.8.3 (GCC)

这看起来可能与堆栈 space 相关。线程很重要,因为虽然所有线程都有自己的堆栈,但这些堆栈都共享同一个内存池。堆栈通常会根据需要增长,直到它们 运行 进入已使用的内存,在这种情况下,这可能是来自另一个线程的堆栈。单线程程序不会有这个问题,并且可以增加它的堆栈。 (还有多个线程,您同时进行多种排序,这将需要更多堆栈 space。)

解决此问题的一种方法是删除递归并使用一些循环和本地存储来替换它。像这样的(未编译或测试过的)代码:

void quickSort(double *distribution, size_t left, size_t right) {
    std::vector<std::pair<size_t, size_t>> ranges;
    for (;;) {
        for (;;) {
            if (left <= right)
                break;
            size_t part = partition(distribution, left, right);

            // save range for later to replace the second recursive call
            ranges.push_back(std::make_pair(part + 1, right));

            // set right == part - 1, then loop, to replace the first recursive call
            right = part - 1;
        }
        if (ranges.empty())
            break;

        // Take top off of ranges for the next loop, replacing the second recursive call
        left = ranges.back().first;
        right = ranges.back().second;
        ranges.pop_back();
    }
}

所以经过更多的努力,我找到了问题的答案。

  1. 我说的对吗?我达到了堆栈限制吗?我到底要怎样 确定是这种情况(除了我所做的推论 多于)? AND

  2. 有没有简单的方法来检测这些? GDB线索或 什么东西?

答: 是的。程序溢出堆栈。我无法确定一种直接的方法来确定 AIX 上的情况是否如此。但是,当我将代码放入 Windows 和 运行 上的 visual studio 2015 时,程序崩溃并出现明显的 "Stack Overflow" 错误。

我希望有一种方法可以在 AIX 上得到一个明确的 'Stack Overflow' 错误,类似于 VS 结果。我找不到办法。即使使用 -fstack-check 编译也没有给我一个存储错误:(

  1. 为什么线程很重要?做所有的线程共享 相同的堆栈限制?

A: AIX 上线程的默认堆栈大小出奇地小!

From this IBM developerworks blog post:

For a 32-bit compiled application on AIX, the default pthread stacksize is 96 KB; and for a 64-bit compiled application on AIX,

  1. 最重要的是:我如何让它工作?!是 海量数据集上的递归快速排序不可行?

我只能想到两个办法: A1: 首先是增加堆栈大小。

From the IBM Debugging Guidelines for Threads The minimum stack size for a thread is 96KB. It is also the default stack size. This number can be retrieved at compilation time using the PTHREAD_STACK_MIN symbolic constant defined in the pthread.h header file.

Note that the maximum stack size is 256MB, the size of a segment. This limit is indicated by the PTHREAD_STACK_MAX symbolic constant in the pthread.h header file.

所以可以将堆栈大小增加到最大 256MB,这是相当多的。

A2: 另一种方法是简单地避免潜在的未绑定递归。我的数据集非常大。可能不够大,无法占用 256MB 的堆栈,但迭代重写快速排序函数相当简单。

void quickSort_iter(double *distribution, size_t left, size_t right)
{
        if (left >= right)
                return;

        std::stack<std::pair<size_t, size_t> > partition_stack;
        partition_stack.push(std::pair<size_t, size_t>(left, right));

        while (!partition_stack.empty())
        {

                left = partition_stack.top().first;
                right = partition_stack.top().second;
                partition_stack.pop();

                size_t pivot = partition(distribution, left, right);

                if (pivot > 1)
                        partition_stack.push(std::pair<size_t, size_t>(left, pivot - 1));

                if (pivot + 1 < right)
                        partition_stack.push(std::pair<size_t, size_t>(pivot + 1, right));
        }
}

std::stack 是使用默认 std::allocator 创建的,因此在内部它使用堆分配来存储排序分区的堆栈,因此不会 运行 违反堆栈限制。