基数排序算法详解
Explanation on Radix Sort Algorithm
我是 progamming.I 的新手,我正在寻找 C++ 中的基数排序实现,我发现了这个
代码在这里。
void countSort(string a[], int size, size_t k)
{
string *b = NULL; int *c = NULL;
b = new string[size];
c = new int[257];
for (int i = 0; i <257; i++){
c[i] = 0;
}
for (int j = 0; j <size; j++){
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
//a[j] is a string
}
for (int f = 1; f <257; f++){
c[f] += c[f - 1];
}
for (int r = size - 1; r >= 0; r--){
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--;
}
for (int l = 0; l < size; l++){
a[l] = b[l];
}
// avold memory leak
delete[] b;
delete[] c;
}
void radixSort(string b[], int r)
{
size_t max = getMax(b, r);
for (size_t digit = max; digit > 0; digit--){
countSort(b, r, digit - 1);
}
}
所以我的问题是这些行的作用:
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--;
那是 MSD 还是 LSD 基数排序?
谢谢。
这是一个不必要的紧凑示例,因此难以阅读代码。
为了分析它,将它分开一点会有所帮助:
// what a mess...
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
首先取出c
s订阅的参数:
// determine index for c
const int iC = k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0;
// post-increment c (as it is it could become a pre-increment as well)
c[iC]++;
指数计算包含一个条件:
// determine index for c
const int iC
// check whether k is (not) exceeding the size of a
= k < a[j].size()
// then
? (int)(unsigned char)a[j][k] + 1
// else
: 0;
数组 a
是一个 std::string
的数组,其中 std::string
本身包含一个 char
的数组。因此,a[j][k]
导致单个 char
。 char
可能是有符号的或无符号的——这留给了编译器。因此,(unsigned char)a[j][k]
不会更改 char
的位,而是将它们解释为无符号数。然后 (int)(unsigned char)a[j][k]
将其提升为 int
.
请注意,如果当前编译器已签署 char
,这可能与 (int)a[j][k]
不同,因为在这种情况下,值的可能符号将被保留。 (这叫做 sign extension。)所以,整个事情只是负责将当前字符转换为(正)索引并最终加 1。
实际上,我打算把剩下的作为 reader 的练习,但后来我看到了这个:
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
像上面那样分开,结果是:
const int iC = k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0;
const int iB = c[iC - 1]; // What?
b[iB] = a[r];
考虑到 iC
可能导致 0(虽然我没有检查整个代码是否可能),iC - 1
可能导致 -1
。因此,c[-1]
将被访问。
这可能是正确的,例如c
指向更大数组但不在数组开头的指针。因此,负索引将访问有效存储。这似乎不是这里的情况:
c = new int[257];
而且我看不到 c
的任何其他分配。
这一切看起来都不太可信。充其量,条件是 over-pessimistic 并且永远不会分配 0。
我很确定我可以证明不太紧凑的代码可以提高可读性,如果不能帮助更容易地发现其中可能存在的问题的话。
那么,non-compact代码是不是变慢了?
根据我的经验,现代编译器并没有以其惊人的优化能力。
我曾经读过一篇关于优化和 Static single assignment form 的文章。
当我调试我的 C++ 代码(它肯定不包含任何名为 $$
的变量)时,我也会不时地看到 Visual Studios 调试器中所有有趣的 $$
变量观察 window。
所以,我相信编译器也会在内部做类似的事情。 – 明确地这样做以提高可读性对性能的影响应该不会最小。
如果我真的有疑问,我仍然可以检查汇编程序输出。
(例如,Compiler Explorer 是个好地方。)
顺便说一句。 c = new int[257];
?
为什么不int c[257];
?
257 int
值没有那么多,我不敢立即超过堆栈大小。
更不用说,数组,尤其是用 new
分配的数组是非常糟糕的 C++ 风格,要求 U.B.. As if std::vector 还没有发明……
当我还是学生的时候,我不知何故错过了关于基数排序的课程(虽然我必须承认我在日常业务中还没有错过这些知识)。
因此,出于好奇,我查看了维基百科和 re-implemented 那里的描述。
这是为了提供(希望更好)替代 OP 在问题中发现和暴露的内容。
因此,我实现了
- 根据 en.wikipedia.org: Radix sort – History
上的描述,一种天真的方法
- 然后 OP 展示了我在 de.wikipedia.org: Countingsort – Algorithmus.
上找到的方法(使用计数排序)
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
/* helper to find max. length in data strings
*/
size_t maxLength(const std::vector<std::string> &data)
{
size_t lenMax = 0;
for (const std::string &value : data) {
if (lenMax < value.size()) lenMax = value.size();
}
return lenMax;
}
/* a naive implementation of radix sort
* like described in https://en.wikipedia.org/wiki/Radix_sort
*/
void radixSort(std::vector<std::string> &data)
{
/* A char has 8 bits - which encode (unsigned) the numbers of [0, 255].
* Hence, 256 buckets are used for sorting.
*/
std::vector<std::string> buckets[256];
// determine max. length of input data:
const size_t len = maxLength(data);
/* iterate over data for according to max. length
*/
for (size_t i = len; i--;) { // i-- -> check for 0 and post-decrement
// sort data into buckets according to the current "digit":
for (std::string &value : data) {
/* digits after end of string are considered as '[=16=]'
* because 0 is the usual end-marker of C strings
* and the least possible value of an unsigned char.
* This shall ensure that an string goes before a longer
* string with same prefix.
*/
const unsigned char digit = i < value.size() ? value[i] : '[=16=]';
// move current string into the corresponding bucket
buckets[digit].push_back(std::move(value));
}
// store buckets back into data (preserving current order)
data.clear();
for (std::vector<std::string> &bucket : buckets) {
// append bucket to the data
data.insert(data.end(),
std::make_move_iterator(bucket.begin()),
std::make_move_iterator(bucket.end()));
bucket.clear();
}
}
}
/* counting sort as helper for the not so naive radix sort
*/
void countSort(std::vector<std::string> &data, size_t i)
{
/* There are 256 possible values for an unsigned char
* (which may have a value in [0, 255]).
*/
size_t counts[256] = { 0 }; // initialize all counters with 0.
// count how often a certain charater appears at the place i
for (const std::string &value : data) {
/* digits after end of string are considered as '[=16=]'
* because 0 is the usual end-marker of C strings
* and the least possible value of an unsigned char.
* This shall ensure that an string goes before a longer
* string with same prefix.
*/
const unsigned char digit = i < value.size() ? value[i] : '[=16=]';
// count the resp. bucket counter
++counts[digit];
}
// turn counts of digits into offsets in data
size_t total = 0;
for (size_t &count : counts) {
#if 0 // could be compact (and, maybe, confusing):
total = count += total; // as C++ assignment is right-associative
#else // but is the same as:
count += total; // add previous total sum to count
total = count; // remember new total
#endif // 0
}
// an auxiliary buffer to sort the input data into.
std::vector<std::string> buffer(data.size());
/* Move input into aux. buffer
* while using the bucket offsets (the former counts)
* for addressing of new positions.
* This is done backwards intentionally as the offsets
* are decremented from end to begin of partitions.
*/
for (size_t j = data.size(); j--;) { // j-- -> check for 0 and post-decrement
std::string &value = data[j];
// see comment for digit above...
const unsigned char digit = i < value.size() ? value[i] : '[=16=]';
/* decrement offset and use as index
* Arrays (and vectors) in C++ are 0-based.
* Hence, this is adjusted respectively (compared to the source of algorithm).
*/
const size_t k = --counts[digit];
// move input element into auxiliary buffer at the determined offset
buffer[k] = std::move(value);
}
/* That's it.
* Move aux. buffer back into data.
*/
data = std::move(buffer);
}
/* radix sort using count sort internally
*/
void radixCountSort(std::vector<std::string> &data)
{
// determine max. length of input data:
const size_t len = maxLength(data);
/* iterate over data according to max. length
*/
for (size_t i = len; i--;) { // i-- -> check for 0 and post-decrement
countSort(data, i);
}
}
/* output of vector with strings
*/
std::ostream& operator<<(std::ostream &out, const std::vector<std::string> &data)
{
const char *sep = " ";
for (const std::string &value : data) {
out << sep << '"' << value << '"';
sep = ", ";
}
return out;
}
/* do a test for certain data
*/
void test(const std::vector<std::string> &data)
{
std::cout << "Data: {" << data << " }\n";
std::vector<std::string> data1 = data;
radixSort(data1);
std::cout << "Radix Sorted: {" << data1 << " }\n";
std::vector<std::string> data2 = data;
radixCountSort(data2);
std::cout << "Radix Count Sorted: {" << data2 << " }\n";
}
/* helper to turn a text into a vector of strings
* (by separating at white spaces)
*/
std::vector<std::string> tokenize(const char *text)
{
std::istringstream in(text);
std::vector<std::string> tokens;
for (std::string token; in >> token;) tokens.push_back(token);
return tokens;
}
/* main program
*/
int main()
{
// do some tests:
test({ "Hi", "He", "Hello", "World", "Wide", "Web" });
test({ });
test(
tokenize(
"Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines.\n"
"Radix sorting algorithms came into common use as a way to sort punched cards as early as 1923.\n"
"The first memory-efficient computer algorithm was developed in 1954 at MIT by Harold H. Seward.\n"
"Computerized radix sorts had previously been dismissed as impractical "
"because of the perceived need for variable allocation of buckets of unknown size.\n"
"Seward's innovation was to use a linear scan to determine the required bucket sizes and offsets beforehand, "
"allowing for a single static allocation of auxiliary memory.\n"
"The linear scan is closely related to Seward's other algorithm - counting sort."));
}
输出:
Data: { "Hi", "He", "Hello", "World", "Wide", "Web" }
Radix Sorted: { "He", "Hello", "Hi", "Web", "Wide", "World" }
Radix Count Sorted: { "He", "Hello", "Hi", "Web", "Wide", "World" }
Data: { }
Radix Sorted: { }
Radix Count Sorted: { }
Data: { "Radix", "sort", "dates", "back", "as", "far", "as", "1887", "to", "the", "work", "of", "Herman", "Hollerith", "on", "tabulating", "machines.", "Radix", "sorting", "algorithms", "came", "into", "common", "use", "as", "a", "way", "to", "sort", "punched", "cards", "as", "early", "as", "1923.", "The", "first", "memory-efficient", "computer", "algorithm", "was", "developed", "in", "1954", "at", "MIT", "by", "Harold", "H.", "Seward.", "Computerized", "radix", "sorts", "had", "previously", "been", "dismissed", "as", "impractical", "because", "of", "the", "perceived", "need", "for", "variable", "allocation", "of", "buckets", "of", "unknown", "size.", "Seward's", "innovation", "was", "to", "use", "a", "linear", "scan", "to", "determine", "the", "required", "bucket", "sizes", "and", "offsets", "beforehand,", "allowing", "for", "a", "single", "static", "allocation", "of", "auxiliary", "memory.", "The", "linear", "scan", "is", "closely", "related", "to", "Seward's", "other", "algorithm", "-", "counting", "sort." }
Radix Sorted: { "-", "1887", "1923.", "1954", "Computerized", "H.", "Harold", "Herman", "Hollerith", "MIT", "Radix", "Radix", "Seward's", "Seward's", "Seward.", "The", "The", "a", "a", "a", "algorithm", "algorithm", "algorithms", "allocation", "allocation", "allowing", "and", "as", "as", "as", "as", "as", "as", "at", "auxiliary", "back", "because", "been", "beforehand,", "bucket", "buckets", "by", "came", "cards", "closely", "common", "computer", "counting", "dates", "determine", "developed", "dismissed", "early", "far", "first", "for", "for", "had", "impractical", "in", "innovation", "into", "is", "linear", "linear", "machines.", "memory-efficient", "memory.", "need", "of", "of", "of", "of", "of", "offsets", "on", "other", "perceived", "previously", "punched", "radix", "related", "required", "scan", "scan", "single", "size.", "sizes", "sort", "sort", "sort.", "sorting", "sorts", "static", "tabulating", "the", "the", "the", "to", "to", "to", "to", "to", "unknown", "use", "use", "variable", "was", "was", "way", "work" }
Radix Count Sorted: { "-", "1887", "1923.", "1954", "Computerized", "H.", "Harold", "Herman", "Hollerith", "MIT", "Radix", "Radix", "Seward's", "Seward's", "Seward.", "The", "The", "a", "a", "a", "algorithm", "algorithm", "algorithms", "allocation", "allocation", "allowing", "and", "as", "as", "as", "as", "as", "as", "at", "auxiliary", "back", "because", "been", "beforehand,", "bucket", "buckets", "by", "came", "cards", "closely", "common", "computer", "counting", "dates", "determine", "developed", "dismissed", "early", "far", "first", "for", "for", "had", "impractical", "in", "innovation", "into", "is", "linear", "linear", "machines.", "memory-efficient", "memory.", "need", "of", "of", "of", "of", "of", "offsets", "on", "other", "perceived", "previously", "punched", "radix", "related", "required", "scan", "scan", "single", "size.", "sizes", "sort", "sort", "sort.", "sorting", "sorts", "static", "tabulating", "the", "the", "the", "to", "to", "to", "to", "to", "unknown", "use", "use", "variable", "was", "was", "way", "work" }
请注意,字符串是根据字符的数值进行排序的。
相反,如果要进行 English-dictionary 排序,则必须修改数字到桶的映射。因此,字符值的顺序可能会改变,并将相应的大写和小写字符映射到同一个桶。
频繁复制字符串(或其他容器)是 space 和 耗时之类的事情,我最多会在生产代码中阻止。
move semantics 是降低 CPU 压力的一种选择,同时保持代码非常干净并且与后面的算法具有可比性。
这是我在示例代码中试图考虑的(据我所知)。
我是 progamming.I 的新手,我正在寻找 C++ 中的基数排序实现,我发现了这个 代码在这里。
void countSort(string a[], int size, size_t k)
{
string *b = NULL; int *c = NULL;
b = new string[size];
c = new int[257];
for (int i = 0; i <257; i++){
c[i] = 0;
}
for (int j = 0; j <size; j++){
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
//a[j] is a string
}
for (int f = 1; f <257; f++){
c[f] += c[f - 1];
}
for (int r = size - 1; r >= 0; r--){
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--;
}
for (int l = 0; l < size; l++){
a[l] = b[l];
}
// avold memory leak
delete[] b;
delete[] c;
}
void radixSort(string b[], int r)
{
size_t max = getMax(b, r);
for (size_t digit = max; digit > 0; digit--){
countSort(b, r, digit - 1);
}
}
所以我的问题是这些行的作用:
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0]--;
那是 MSD 还是 LSD 基数排序?
谢谢。
这是一个不必要的紧凑示例,因此难以阅读代码。
为了分析它,将它分开一点会有所帮助:
// what a mess...
c[k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0]++;
首先取出c
s订阅的参数:
// determine index for c
const int iC = k < a[j].size() ? (int)(unsigned char)a[j][k] + 1 : 0;
// post-increment c (as it is it could become a pre-increment as well)
c[iC]++;
指数计算包含一个条件:
// determine index for c
const int iC
// check whether k is (not) exceeding the size of a
= k < a[j].size()
// then
? (int)(unsigned char)a[j][k] + 1
// else
: 0;
数组 a
是一个 std::string
的数组,其中 std::string
本身包含一个 char
的数组。因此,a[j][k]
导致单个 char
。 char
可能是有符号的或无符号的——这留给了编译器。因此,(unsigned char)a[j][k]
不会更改 char
的位,而是将它们解释为无符号数。然后 (int)(unsigned char)a[j][k]
将其提升为 int
.
请注意,如果当前编译器已签署 char
,这可能与 (int)a[j][k]
不同,因为在这种情况下,值的可能符号将被保留。 (这叫做 sign extension。)所以,整个事情只是负责将当前字符转换为(正)索引并最终加 1。
实际上,我打算把剩下的作为 reader 的练习,但后来我看到了这个:
b[c[k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0] - 1] = a[r];
像上面那样分开,结果是:
const int iC = k < a[r].size() ? (int)(unsigned char)a[r][k] + 1 : 0;
const int iB = c[iC - 1]; // What?
b[iB] = a[r];
考虑到 iC
可能导致 0(虽然我没有检查整个代码是否可能),iC - 1
可能导致 -1
。因此,c[-1]
将被访问。
这可能是正确的,例如c
指向更大数组但不在数组开头的指针。因此,负索引将访问有效存储。这似乎不是这里的情况:
c = new int[257];
而且我看不到 c
的任何其他分配。
这一切看起来都不太可信。充其量,条件是 over-pessimistic 并且永远不会分配 0。
我很确定我可以证明不太紧凑的代码可以提高可读性,如果不能帮助更容易地发现其中可能存在的问题的话。
那么,non-compact代码是不是变慢了? 根据我的经验,现代编译器并没有以其惊人的优化能力。
我曾经读过一篇关于优化和 Static single assignment form 的文章。
当我调试我的 C++ 代码(它肯定不包含任何名为 $$
的变量)时,我也会不时地看到 Visual Studios 调试器中所有有趣的 $$
变量观察 window。
所以,我相信编译器也会在内部做类似的事情。 – 明确地这样做以提高可读性对性能的影响应该不会最小。
如果我真的有疑问,我仍然可以检查汇编程序输出。 (例如,Compiler Explorer 是个好地方。)
顺便说一句。 c = new int[257];
?
为什么不int c[257];
?
257 int
值没有那么多,我不敢立即超过堆栈大小。
更不用说,数组,尤其是用 new
分配的数组是非常糟糕的 C++ 风格,要求 U.B.. As if std::vector 还没有发明……
当我还是学生的时候,我不知何故错过了关于基数排序的课程(虽然我必须承认我在日常业务中还没有错过这些知识)。 因此,出于好奇,我查看了维基百科和 re-implemented 那里的描述。 这是为了提供(希望更好)替代 OP 在问题中发现和暴露的内容。
因此,我实现了
- 根据 en.wikipedia.org: Radix sort – History 上的描述,一种天真的方法
- 然后 OP 展示了我在 de.wikipedia.org: Countingsort – Algorithmus. 上找到的方法(使用计数排序)
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
/* helper to find max. length in data strings
*/
size_t maxLength(const std::vector<std::string> &data)
{
size_t lenMax = 0;
for (const std::string &value : data) {
if (lenMax < value.size()) lenMax = value.size();
}
return lenMax;
}
/* a naive implementation of radix sort
* like described in https://en.wikipedia.org/wiki/Radix_sort
*/
void radixSort(std::vector<std::string> &data)
{
/* A char has 8 bits - which encode (unsigned) the numbers of [0, 255].
* Hence, 256 buckets are used for sorting.
*/
std::vector<std::string> buckets[256];
// determine max. length of input data:
const size_t len = maxLength(data);
/* iterate over data for according to max. length
*/
for (size_t i = len; i--;) { // i-- -> check for 0 and post-decrement
// sort data into buckets according to the current "digit":
for (std::string &value : data) {
/* digits after end of string are considered as '[=16=]'
* because 0 is the usual end-marker of C strings
* and the least possible value of an unsigned char.
* This shall ensure that an string goes before a longer
* string with same prefix.
*/
const unsigned char digit = i < value.size() ? value[i] : '[=16=]';
// move current string into the corresponding bucket
buckets[digit].push_back(std::move(value));
}
// store buckets back into data (preserving current order)
data.clear();
for (std::vector<std::string> &bucket : buckets) {
// append bucket to the data
data.insert(data.end(),
std::make_move_iterator(bucket.begin()),
std::make_move_iterator(bucket.end()));
bucket.clear();
}
}
}
/* counting sort as helper for the not so naive radix sort
*/
void countSort(std::vector<std::string> &data, size_t i)
{
/* There are 256 possible values for an unsigned char
* (which may have a value in [0, 255]).
*/
size_t counts[256] = { 0 }; // initialize all counters with 0.
// count how often a certain charater appears at the place i
for (const std::string &value : data) {
/* digits after end of string are considered as '[=16=]'
* because 0 is the usual end-marker of C strings
* and the least possible value of an unsigned char.
* This shall ensure that an string goes before a longer
* string with same prefix.
*/
const unsigned char digit = i < value.size() ? value[i] : '[=16=]';
// count the resp. bucket counter
++counts[digit];
}
// turn counts of digits into offsets in data
size_t total = 0;
for (size_t &count : counts) {
#if 0 // could be compact (and, maybe, confusing):
total = count += total; // as C++ assignment is right-associative
#else // but is the same as:
count += total; // add previous total sum to count
total = count; // remember new total
#endif // 0
}
// an auxiliary buffer to sort the input data into.
std::vector<std::string> buffer(data.size());
/* Move input into aux. buffer
* while using the bucket offsets (the former counts)
* for addressing of new positions.
* This is done backwards intentionally as the offsets
* are decremented from end to begin of partitions.
*/
for (size_t j = data.size(); j--;) { // j-- -> check for 0 and post-decrement
std::string &value = data[j];
// see comment for digit above...
const unsigned char digit = i < value.size() ? value[i] : '[=16=]';
/* decrement offset and use as index
* Arrays (and vectors) in C++ are 0-based.
* Hence, this is adjusted respectively (compared to the source of algorithm).
*/
const size_t k = --counts[digit];
// move input element into auxiliary buffer at the determined offset
buffer[k] = std::move(value);
}
/* That's it.
* Move aux. buffer back into data.
*/
data = std::move(buffer);
}
/* radix sort using count sort internally
*/
void radixCountSort(std::vector<std::string> &data)
{
// determine max. length of input data:
const size_t len = maxLength(data);
/* iterate over data according to max. length
*/
for (size_t i = len; i--;) { // i-- -> check for 0 and post-decrement
countSort(data, i);
}
}
/* output of vector with strings
*/
std::ostream& operator<<(std::ostream &out, const std::vector<std::string> &data)
{
const char *sep = " ";
for (const std::string &value : data) {
out << sep << '"' << value << '"';
sep = ", ";
}
return out;
}
/* do a test for certain data
*/
void test(const std::vector<std::string> &data)
{
std::cout << "Data: {" << data << " }\n";
std::vector<std::string> data1 = data;
radixSort(data1);
std::cout << "Radix Sorted: {" << data1 << " }\n";
std::vector<std::string> data2 = data;
radixCountSort(data2);
std::cout << "Radix Count Sorted: {" << data2 << " }\n";
}
/* helper to turn a text into a vector of strings
* (by separating at white spaces)
*/
std::vector<std::string> tokenize(const char *text)
{
std::istringstream in(text);
std::vector<std::string> tokens;
for (std::string token; in >> token;) tokens.push_back(token);
return tokens;
}
/* main program
*/
int main()
{
// do some tests:
test({ "Hi", "He", "Hello", "World", "Wide", "Web" });
test({ });
test(
tokenize(
"Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines.\n"
"Radix sorting algorithms came into common use as a way to sort punched cards as early as 1923.\n"
"The first memory-efficient computer algorithm was developed in 1954 at MIT by Harold H. Seward.\n"
"Computerized radix sorts had previously been dismissed as impractical "
"because of the perceived need for variable allocation of buckets of unknown size.\n"
"Seward's innovation was to use a linear scan to determine the required bucket sizes and offsets beforehand, "
"allowing for a single static allocation of auxiliary memory.\n"
"The linear scan is closely related to Seward's other algorithm - counting sort."));
}
输出:
Data: { "Hi", "He", "Hello", "World", "Wide", "Web" }
Radix Sorted: { "He", "Hello", "Hi", "Web", "Wide", "World" }
Radix Count Sorted: { "He", "Hello", "Hi", "Web", "Wide", "World" }
Data: { }
Radix Sorted: { }
Radix Count Sorted: { }
Data: { "Radix", "sort", "dates", "back", "as", "far", "as", "1887", "to", "the", "work", "of", "Herman", "Hollerith", "on", "tabulating", "machines.", "Radix", "sorting", "algorithms", "came", "into", "common", "use", "as", "a", "way", "to", "sort", "punched", "cards", "as", "early", "as", "1923.", "The", "first", "memory-efficient", "computer", "algorithm", "was", "developed", "in", "1954", "at", "MIT", "by", "Harold", "H.", "Seward.", "Computerized", "radix", "sorts", "had", "previously", "been", "dismissed", "as", "impractical", "because", "of", "the", "perceived", "need", "for", "variable", "allocation", "of", "buckets", "of", "unknown", "size.", "Seward's", "innovation", "was", "to", "use", "a", "linear", "scan", "to", "determine", "the", "required", "bucket", "sizes", "and", "offsets", "beforehand,", "allowing", "for", "a", "single", "static", "allocation", "of", "auxiliary", "memory.", "The", "linear", "scan", "is", "closely", "related", "to", "Seward's", "other", "algorithm", "-", "counting", "sort." }
Radix Sorted: { "-", "1887", "1923.", "1954", "Computerized", "H.", "Harold", "Herman", "Hollerith", "MIT", "Radix", "Radix", "Seward's", "Seward's", "Seward.", "The", "The", "a", "a", "a", "algorithm", "algorithm", "algorithms", "allocation", "allocation", "allowing", "and", "as", "as", "as", "as", "as", "as", "at", "auxiliary", "back", "because", "been", "beforehand,", "bucket", "buckets", "by", "came", "cards", "closely", "common", "computer", "counting", "dates", "determine", "developed", "dismissed", "early", "far", "first", "for", "for", "had", "impractical", "in", "innovation", "into", "is", "linear", "linear", "machines.", "memory-efficient", "memory.", "need", "of", "of", "of", "of", "of", "offsets", "on", "other", "perceived", "previously", "punched", "radix", "related", "required", "scan", "scan", "single", "size.", "sizes", "sort", "sort", "sort.", "sorting", "sorts", "static", "tabulating", "the", "the", "the", "to", "to", "to", "to", "to", "unknown", "use", "use", "variable", "was", "was", "way", "work" }
Radix Count Sorted: { "-", "1887", "1923.", "1954", "Computerized", "H.", "Harold", "Herman", "Hollerith", "MIT", "Radix", "Radix", "Seward's", "Seward's", "Seward.", "The", "The", "a", "a", "a", "algorithm", "algorithm", "algorithms", "allocation", "allocation", "allowing", "and", "as", "as", "as", "as", "as", "as", "at", "auxiliary", "back", "because", "been", "beforehand,", "bucket", "buckets", "by", "came", "cards", "closely", "common", "computer", "counting", "dates", "determine", "developed", "dismissed", "early", "far", "first", "for", "for", "had", "impractical", "in", "innovation", "into", "is", "linear", "linear", "machines.", "memory-efficient", "memory.", "need", "of", "of", "of", "of", "of", "offsets", "on", "other", "perceived", "previously", "punched", "radix", "related", "required", "scan", "scan", "single", "size.", "sizes", "sort", "sort", "sort.", "sorting", "sorts", "static", "tabulating", "the", "the", "the", "to", "to", "to", "to", "to", "unknown", "use", "use", "variable", "was", "was", "way", "work" }
请注意,字符串是根据字符的数值进行排序的。 相反,如果要进行 English-dictionary 排序,则必须修改数字到桶的映射。因此,字符值的顺序可能会改变,并将相应的大写和小写字符映射到同一个桶。
频繁复制字符串(或其他容器)是 space 和 耗时之类的事情,我最多会在生产代码中阻止。 move semantics 是降低 CPU 压力的一种选择,同时保持代码非常干净并且与后面的算法具有可比性。 这是我在示例代码中试图考虑的(据我所知)。