在并行桶排序中使用递归基数排序
Using recursive radix sort in parallel bucket sort
我正在尝试编写一种快速算法来对包含大量整数的向量进行排序,例如:
159 14 5 97 6 54
到目前为止,我的程序通过 MSD 将矢量分成小桶,如:
bucket[1]:159 14
bucket[5]:5 54
bucket[6]:6
bucket[9]:97
现在我需要使用递归基数排序以最高有效数字顺序对存储桶进行排序:
bucket[1]:14 159
bucket[5]:5 54
bucket[6]:6
bucket[9]:97
这是我在网上找到的递归基数代码:
// Sort 'size' number of integers starting at 'input' according to the 'digit'th digit
// For the parameter 'digit', 0 denotes the least significant digit and increases as significance does
void radixSort(int* input, int size, int digit){
if (size == 0)
return;
int[10] buckets; // assuming decimal numbers
// Sort the array in place while keeping track of bucket starting indices.
// If bucket[i] is meant to be empty (no numbers with i at the specified digit),
// then let bucket[i+1] = bucket[i]
for (int i = 0; i < 10; ++i)
{
radixSort(input + buckets[i], buckets[i+1] - buckets[i], digit+1);
}
}
我不知道如何在我的代码中实现这一点,我不确定 bucket[] 在上面的代码中做了什么。谁能解释我应该做哪些改变?这是我正在使用的多线程代码,由于我没有使用递归,因此性能不佳。
void sort(unsigned int numCores, std::vector<unsigned int> numbersToSort){
// ******************Stage 1****************
// Use multithread to seperate numbers into buckets using the most significant digits
auto smallbuckets = std::vector<std::shared_ptr<std::vector<std::vector<unsigned int>>>>();
std::mutex mutex;
unsigned int workload = numbersToSort.size() / numCores;
std::function<void(unsigned int, unsigned int, unsigned int)> put_small_buckets;
put_small_buckets = [this, &smallbuckets, &mutex]
(unsigned int id, unsigned int start, unsigned int end) {
auto buckets = std::make_shared<std::vector<std::vector<unsigned int>>>(std::vector<std::vector<unsigned int>>());
for (int j = 0; j < 10; ++j) {
buckets->push_back(std::vector<unsigned int>());
}
for (unsigned int i = start; i < end; ++i) {
unsigned int a = numbersToSort[i];
std::string tmp = std::to_string(a);
char c = tmp.at(0);
int ia = c - '0';
(*buckets)[ia].push_back(numbersToSort[i]);
}
std::lock_guard<std::mutex> lock(mutex);
smallbuckets.push_back(buckets);
};
// create a container of threads
std::vector<std::shared_ptr<std::thread>> containerOfThreads;
// create threads and add them to the container.
for (unsigned int i = 0; i < numCores; ++i) {
// start the thread.
unsigned int start = workload * i;
unsigned int end = workload * (i + 1);
if(i == numCores - 1) end = this->numbersToSort.size() ;
containerOfThreads.push_back(std::make_shared<std::thread>(put_small_buckets, i, start, end));
}
// join all the threads back together.
for (auto t : containerOfThreads) t->join();
numbersToSort.clear();
// ******************Stage 2****************
// Put small multithreaded buckets back to the bucket of radix(10)
auto bigbuckets = std::vector<std::shared_ptr<std::vector<unsigned int>>>();
for (int j = 0; j < 10; ++j) {
bigbuckets.push_back(std::make_shared<std::vector<unsigned int>>(std::vector<unsigned int>()));
}
int current_index = 10;
std::function<void()> collect;
collect = [this, &smallbuckets, ¤t_index, &mutex, &collect, &bigbuckets] () {
mutex.lock();
int index = --current_index;
mutex.unlock();
if (index < 0) return;
auto mybucket = bigbuckets[index];
for (auto i = smallbuckets.begin(); i != smallbuckets.end(); ++i) {
mybucket->insert(mybucket->end(), (*(*i))[index].begin(), (*(*i))[index].end());
}
collect();
};
// create a container of threads
containerOfThreads.clear();
// create threads and add them to the container.
for (unsigned int i = 0; i < numCores; ++i) {
containerOfThreads.push_back(std::make_shared<std::thread>(collect));
}
// join all the threads back together.
for (auto t : containerOfThreads) t->join();
// ******************Stage 3****************
// Sort big buckets
for (int j = 0; j < 10; ++j) {
bigbuckets.push_back(std::make_shared<std::vector<unsigned int>>(std::vector<unsigned int>()));
}
std::function<void(unsigned int, unsigned int)> sort_big_buckets;
sort_big_buckets = [this, &bigbuckets, &mutex]
(unsigned int start, unsigned int end) {
unsigned int curr = start;
while(curr < end){
auto mybucket = bigbuckets[curr];
std::sort(mybucket->begin(),mybucket->end(), [](const unsigned int& x, const unsigned int& y){
std::string tmp1 = std::to_string(x);
std::string tmp2 = std::to_string(y);
return lexicographical_compare(tmp1.begin(), tmp1.end(), tmp2.begin(), tmp2.end());
//return aLessB(x,y,0);
} );
++curr;
}
};
// create a container of threads
containerOfThreads.clear();
workload = 10 / numCores;
// create threads and add them to the container.
for (unsigned int i = 0; i < numCores; ++i) {
// start the thread.
unsigned int start = workload * i;
unsigned int end = workload * (i + 1);
if(i == numCores - 1) end = 10 ;
containerOfThreads.push_back(std::make_shared<std::thread>(sort_big_buckets, start, end));
}
// join all the threads back together.
for (auto t : containerOfThreads) t->join();
// put all elements back to numbersToSort
for (auto i = bigbuckets.begin(); i != bigbuckets.end(); ++i) {
numbersToSort.insert(numbersToSort.end(), (*i)->begin(), (*i)->end());
}
}
I don't know how to implement this bit into my code, I'm not sure about what bucket[] do in the code above. Can anyone explain what changes should I make?
老实说,不需要 buckets[]。这个想法是在这里保留桶开始的索引,但是由于后面的桶以相同的顺序一个一个地处理,所以可以使用几个额外的变量来代替这个数组。
正如我所说,您应该将数字转换为字符串并对字符串进行排序。这样您就可以在每个分桶中检查 1 个字符,而不是执行所有创建字符串->比较->销毁字符串操作。最后,您必须将字符串转换回数字。
您询问的代码部分可能如下所示:
void radixSort(std::vector<std::string>::iterator begin, std::vector<std::string>::iterator end, int digit){
if (begin == end)
return;
// first skip short numbers
e = begin;
for (auto p = begin; p != end; ++p)
if (p->size() <= digit)
{
if (p != e)
std::swap(*p, *e);
q++;
}
if (e == end)
return;
for (char d = '0'; d <= '9'; ++d)
{
auto s = e;
for (auto p = e; p != end; ++p)
if (p->at(digit) == d)
{
if (p != e)
std::swap(*p, *e);
e++;
}
radixSort(s, e, digit+1);
}
}
要对字符串向量进行排序,您可以这样做:
radixSort(v.begin(), v.end(), 0);
我正在尝试编写一种快速算法来对包含大量整数的向量进行排序,例如:
159 14 5 97 6 54
到目前为止,我的程序通过 MSD 将矢量分成小桶,如:
bucket[1]:159 14
bucket[5]:5 54
bucket[6]:6
bucket[9]:97
现在我需要使用递归基数排序以最高有效数字顺序对存储桶进行排序:
bucket[1]:14 159
bucket[5]:5 54
bucket[6]:6
bucket[9]:97
这是我在网上找到的递归基数代码:
// Sort 'size' number of integers starting at 'input' according to the 'digit'th digit
// For the parameter 'digit', 0 denotes the least significant digit and increases as significance does
void radixSort(int* input, int size, int digit){
if (size == 0)
return;
int[10] buckets; // assuming decimal numbers
// Sort the array in place while keeping track of bucket starting indices.
// If bucket[i] is meant to be empty (no numbers with i at the specified digit),
// then let bucket[i+1] = bucket[i]
for (int i = 0; i < 10; ++i)
{
radixSort(input + buckets[i], buckets[i+1] - buckets[i], digit+1);
}
}
我不知道如何在我的代码中实现这一点,我不确定 bucket[] 在上面的代码中做了什么。谁能解释我应该做哪些改变?这是我正在使用的多线程代码,由于我没有使用递归,因此性能不佳。
void sort(unsigned int numCores, std::vector<unsigned int> numbersToSort){
// ******************Stage 1****************
// Use multithread to seperate numbers into buckets using the most significant digits
auto smallbuckets = std::vector<std::shared_ptr<std::vector<std::vector<unsigned int>>>>();
std::mutex mutex;
unsigned int workload = numbersToSort.size() / numCores;
std::function<void(unsigned int, unsigned int, unsigned int)> put_small_buckets;
put_small_buckets = [this, &smallbuckets, &mutex]
(unsigned int id, unsigned int start, unsigned int end) {
auto buckets = std::make_shared<std::vector<std::vector<unsigned int>>>(std::vector<std::vector<unsigned int>>());
for (int j = 0; j < 10; ++j) {
buckets->push_back(std::vector<unsigned int>());
}
for (unsigned int i = start; i < end; ++i) {
unsigned int a = numbersToSort[i];
std::string tmp = std::to_string(a);
char c = tmp.at(0);
int ia = c - '0';
(*buckets)[ia].push_back(numbersToSort[i]);
}
std::lock_guard<std::mutex> lock(mutex);
smallbuckets.push_back(buckets);
};
// create a container of threads
std::vector<std::shared_ptr<std::thread>> containerOfThreads;
// create threads and add them to the container.
for (unsigned int i = 0; i < numCores; ++i) {
// start the thread.
unsigned int start = workload * i;
unsigned int end = workload * (i + 1);
if(i == numCores - 1) end = this->numbersToSort.size() ;
containerOfThreads.push_back(std::make_shared<std::thread>(put_small_buckets, i, start, end));
}
// join all the threads back together.
for (auto t : containerOfThreads) t->join();
numbersToSort.clear();
// ******************Stage 2****************
// Put small multithreaded buckets back to the bucket of radix(10)
auto bigbuckets = std::vector<std::shared_ptr<std::vector<unsigned int>>>();
for (int j = 0; j < 10; ++j) {
bigbuckets.push_back(std::make_shared<std::vector<unsigned int>>(std::vector<unsigned int>()));
}
int current_index = 10;
std::function<void()> collect;
collect = [this, &smallbuckets, ¤t_index, &mutex, &collect, &bigbuckets] () {
mutex.lock();
int index = --current_index;
mutex.unlock();
if (index < 0) return;
auto mybucket = bigbuckets[index];
for (auto i = smallbuckets.begin(); i != smallbuckets.end(); ++i) {
mybucket->insert(mybucket->end(), (*(*i))[index].begin(), (*(*i))[index].end());
}
collect();
};
// create a container of threads
containerOfThreads.clear();
// create threads and add them to the container.
for (unsigned int i = 0; i < numCores; ++i) {
containerOfThreads.push_back(std::make_shared<std::thread>(collect));
}
// join all the threads back together.
for (auto t : containerOfThreads) t->join();
// ******************Stage 3****************
// Sort big buckets
for (int j = 0; j < 10; ++j) {
bigbuckets.push_back(std::make_shared<std::vector<unsigned int>>(std::vector<unsigned int>()));
}
std::function<void(unsigned int, unsigned int)> sort_big_buckets;
sort_big_buckets = [this, &bigbuckets, &mutex]
(unsigned int start, unsigned int end) {
unsigned int curr = start;
while(curr < end){
auto mybucket = bigbuckets[curr];
std::sort(mybucket->begin(),mybucket->end(), [](const unsigned int& x, const unsigned int& y){
std::string tmp1 = std::to_string(x);
std::string tmp2 = std::to_string(y);
return lexicographical_compare(tmp1.begin(), tmp1.end(), tmp2.begin(), tmp2.end());
//return aLessB(x,y,0);
} );
++curr;
}
};
// create a container of threads
containerOfThreads.clear();
workload = 10 / numCores;
// create threads and add them to the container.
for (unsigned int i = 0; i < numCores; ++i) {
// start the thread.
unsigned int start = workload * i;
unsigned int end = workload * (i + 1);
if(i == numCores - 1) end = 10 ;
containerOfThreads.push_back(std::make_shared<std::thread>(sort_big_buckets, start, end));
}
// join all the threads back together.
for (auto t : containerOfThreads) t->join();
// put all elements back to numbersToSort
for (auto i = bigbuckets.begin(); i != bigbuckets.end(); ++i) {
numbersToSort.insert(numbersToSort.end(), (*i)->begin(), (*i)->end());
}
}
I don't know how to implement this bit into my code, I'm not sure about what bucket[] do in the code above. Can anyone explain what changes should I make?
老实说,不需要 buckets[]。这个想法是在这里保留桶开始的索引,但是由于后面的桶以相同的顺序一个一个地处理,所以可以使用几个额外的变量来代替这个数组。
正如我所说,您应该将数字转换为字符串并对字符串进行排序。这样您就可以在每个分桶中检查 1 个字符,而不是执行所有创建字符串->比较->销毁字符串操作。最后,您必须将字符串转换回数字。
您询问的代码部分可能如下所示:
void radixSort(std::vector<std::string>::iterator begin, std::vector<std::string>::iterator end, int digit){
if (begin == end)
return;
// first skip short numbers
e = begin;
for (auto p = begin; p != end; ++p)
if (p->size() <= digit)
{
if (p != e)
std::swap(*p, *e);
q++;
}
if (e == end)
return;
for (char d = '0'; d <= '9'; ++d)
{
auto s = e;
for (auto p = e; p != end; ++p)
if (p->at(digit) == d)
{
if (p != e)
std::swap(*p, *e);
e++;
}
radixSort(s, e, digit+1);
}
}
要对字符串向量进行排序,您可以这样做:
radixSort(v.begin(), v.end(), 0);