Median of Medians 算法误解的中位数?
Median of Medians algorithm misunderstanding?
我已经明白了
我知道中位数算法的中位数(我将表示为 MoM)是一个高常数因子 O(N) 算法。它找到 k 组(通常为 5)的中位数,并将它们用作下一次迭代的集合以查找的中位数。找到这个后的枢轴将在原始集的 3/10n 和 7/10n 之间,其中 n 是找到一个中值基本情况所需的迭代次数。
当我 运行 为 MoM 编写此代码时,我一直遇到分段错误,但我不确定为什么。我调试了它并认为问题出在我调用 medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);
的事实。但是,我认为这在逻辑上是合理的,因为我们应该通过调用自身来递归地找到中位数。也许我的基本情况不正确?在 YogiBearian 在 youtube 上的教程中(一位斯坦福教授,link: https://www.youtube.com/watch?v=YU1HfMiJzwg),他没有说明任何额外的基本情况来处理 O(N/5) 中的递归操作环月
完整代码
注意:根据建议,我添加了一个基本案例并通过向量使用了 .at() 函数。
static const int GROUP_SIZE = 5;
/* Helper function for m of m. This function divides the array into chunks of 5
* and finds the median of each group and puts it into a vector to return.
* The last group will be sorted and the median will be found despite its uneven size.
*/
vector<int> findMedians(vector<int>& vec, int start, int end){
vector<int> medians;
for(int i = start; i <= end; i+= GROUP_SIZE){
std::sort(vec.begin()+i, min(vec.begin()+i+GROUP_SIZE, vec.end()));
medians.push_back(vec.at(min(i + (GROUP_SIZE/2), (i + end)/2)));
}
return medians;
}
/* Job is to partition the array into chunks of 5(subject to change via const)
* And then find the median of them. Do this recursively using select as well.
*/
int medianOfMedian(vector<int>& vec, int start, int end, int k){
/* Acquire the medians of the 5-groups */
vector<int> medians = findMedians(vec, start, end);
/* Find the median of this */
int pivotVal;
if(medians.size() == 1)
pivotVal = medians.at(0);
else
pivotVal = medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);
/* Stealing a page from select() ... */
int pivot = partitionHelper(vec, pivotVal, start, end);
cout << "After pivoting with the value " << pivot << " we get : " << endl;
for(int i = start; i < end; i++){
cout << vec.at(i) << ", ";
}
cout << "\n\n" << endl;
usleep(10000);
int length = pivot - start + 1;
if(k < length){
return medianOfMedian(vec, k, start, pivot-1);
}
else if(k == length){
return vec[k];
}
else{
return medianOfMedian(vec, k-length, pivot+1, end);
}
}
一些辅助单元测试的额外函数
这是我为这两个函数编写的一些单元测试。希望他们有所帮助。
vector<int> initialize(int size, int mod){
int arr[size];
for(int i = 0; i < size; i++){
arr[i] = rand() % mod;
}
vector<int> vec(arr, arr+size);
return vec;
}
/* Unit test for findMedians */
void testFindMedians(){
const int SIZE = 36;
const int MOD = 20;
vector<int> vec = initialize(SIZE, MOD);
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
vector<int> medians = findMedians(vec, 0, SIZE-1);
cout << "The 5-sorted version: " << endl;
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
cout << "The medians extracted: " << endl;
for(int i = 0; i < medians.size(); i++){
cout << medians[i] << ", ";
}
cout << "\n\n" << endl;
}
/* Unit test for medianOfMedian */
void testMedianOfMedian(){
const int SIZE = 30;
const int MOD = 70;
vector<int> vec = initialize(SIZE, MOD);
cout << "Given array : " << endl;
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
int median = medianOfMedian(vec, 0, vec.size()-1, vec.size()/2);
cout << "\n\nThe median is : " << median << endl;
cout << "As opposed to sorting and then showing the median... : " << endl;
std::sort(vec.begin(), vec.end());
cout << "sorted array : " << endl;
for(int i = 0; i < SIZE; i++){
if(i == SIZE/2)
cout << "**";
cout << vec[i] << ", ";
}
cout << "Median : " << vec[SIZE/2] << endl;
}
关于我得到的输出的额外部分
Given array :
7, 49, 23, 48, 20, 62, 44, 8, 43, 29, 20, 65, 42, 62, 7, 33, 37, 39, 60, 52, 53, 19, 29, 7, 50, 3, 69, 58, 56, 65,
After pivoting with the value 5 we get :
23, 29, 39, 42, 43,
After pivoting with the value 0 we get :
39,
Segmentation Fault: 11
在出现分段错误之前,一切似乎都还不错。我相信我的分区函数也能正常工作(是 leetcode 问题的实现之一)。
免责声明:这不是作业题,而是我在leetcode问题集中使用quickSelect后对算法的好奇
如果我提出的问题需要对 MVCE 进行更多详细说明,请告诉我,谢谢!
编辑:我发现我的代码中的递归分区方案是错误的。正如 Pradhan 所指出的那样 - 我不知何故有空向量导致开始和结束分别为 0 和 -1,导致我在调用它的无限循环中出现分段错误。仍在尝试弄清楚这部分。
MoM
总是 调用自身(以计算 pivot
),因此表现出 无限递归 。这违反了递归算法的"prime directive":在某些时候,问题"small"足以不需要递归调用。
正确实施
在 Scott 的提示的帮助下,我能够正确实现这个中位数算法。我修复了它并意识到我的主要想法是正确的,但有几个错误:
我的基本情况应该是大小为 <=5 的子向量。
关于最后一个数字(变量结束)在这种情况下是否应被视为包含在内或作为上限小于,存在一些细微差别。在下面的这个实现中,我让它的上限小于定义。
在下面。我也接受了 Scott 的回答 - 谢谢 Scott!
/* In case someone wants to pass in the pivValue, I broke partition into 2 pieces.
*/
int pivot(vector<int>& vec, int pivot, int start, int end){
/* Now we need to go into the array with a starting left and right value. */
int left = start, right = end-1;
while(left < right){
/* Increase the left and the right values until inappropriate value comes */
while(vec.at(left) < pivot && left <= right) left++;
while(vec.at(right) > pivot && right >= left) right--;
/* In case of duplicate values, we must take care of this special case. */
if(left >= right) break;
else if(vec.at(left) == vec.at(right)){ left++; continue; }
/* Do the normal swapping */
int temp = vec.at(left);
vec.at(left) = vec.at(right);
vec.at(right) = temp;
}
return right;
}
/* Returns the k-th element of this array. */
int MoM(vector<int>& vec, int k, int start, int end){
/* Start by base case: Sort if less than 10 size
* E.x.: Size = 9, 9 - 0 = 9.
*/
if(end-start < 10){
sort(vec.begin()+start, vec.begin()+end);
return vec.at(k);
}
vector<int> medians;
/* Now sort every consecutive 5 */
for(int i = start; i < end; i+=5){
if(end - i < 10){
sort(vec.begin()+i, vec.begin()+end);
medians.push_back(vec.at((i+end)/2));
}
else{
sort(vec.begin()+i, vec.begin()+i+5);
medians.push_back(vec.at(i+2));
}
}
int median = MoM(medians, medians.size()/2, 0, medians.size());
/* use the median to pivot around */
int piv = pivot(vec, median, start, end);
int length = piv - start+1;
if(k < length){
return MoM(vec, k, start, piv);
}
else if(k > length){
return MoM(vec, k-length, piv+1, end);
}
else
return vec[k];
}
我已经明白了
我知道中位数算法的中位数(我将表示为 MoM)是一个高常数因子 O(N) 算法。它找到 k 组(通常为 5)的中位数,并将它们用作下一次迭代的集合以查找的中位数。找到这个后的枢轴将在原始集的 3/10n 和 7/10n 之间,其中 n 是找到一个中值基本情况所需的迭代次数。
当我 运行 为 MoM 编写此代码时,我一直遇到分段错误,但我不确定为什么。我调试了它并认为问题出在我调用 medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);
的事实。但是,我认为这在逻辑上是合理的,因为我们应该通过调用自身来递归地找到中位数。也许我的基本情况不正确?在 YogiBearian 在 youtube 上的教程中(一位斯坦福教授,link: https://www.youtube.com/watch?v=YU1HfMiJzwg),他没有说明任何额外的基本情况来处理 O(N/5) 中的递归操作环月
完整代码
注意:根据建议,我添加了一个基本案例并通过向量使用了 .at() 函数。
static const int GROUP_SIZE = 5;
/* Helper function for m of m. This function divides the array into chunks of 5
* and finds the median of each group and puts it into a vector to return.
* The last group will be sorted and the median will be found despite its uneven size.
*/
vector<int> findMedians(vector<int>& vec, int start, int end){
vector<int> medians;
for(int i = start; i <= end; i+= GROUP_SIZE){
std::sort(vec.begin()+i, min(vec.begin()+i+GROUP_SIZE, vec.end()));
medians.push_back(vec.at(min(i + (GROUP_SIZE/2), (i + end)/2)));
}
return medians;
}
/* Job is to partition the array into chunks of 5(subject to change via const)
* And then find the median of them. Do this recursively using select as well.
*/
int medianOfMedian(vector<int>& vec, int start, int end, int k){
/* Acquire the medians of the 5-groups */
vector<int> medians = findMedians(vec, start, end);
/* Find the median of this */
int pivotVal;
if(medians.size() == 1)
pivotVal = medians.at(0);
else
pivotVal = medianOfMedian(medians, 0, medians.size()-1, medians.size()/2);
/* Stealing a page from select() ... */
int pivot = partitionHelper(vec, pivotVal, start, end);
cout << "After pivoting with the value " << pivot << " we get : " << endl;
for(int i = start; i < end; i++){
cout << vec.at(i) << ", ";
}
cout << "\n\n" << endl;
usleep(10000);
int length = pivot - start + 1;
if(k < length){
return medianOfMedian(vec, k, start, pivot-1);
}
else if(k == length){
return vec[k];
}
else{
return medianOfMedian(vec, k-length, pivot+1, end);
}
}
一些辅助单元测试的额外函数
这是我为这两个函数编写的一些单元测试。希望他们有所帮助。
vector<int> initialize(int size, int mod){
int arr[size];
for(int i = 0; i < size; i++){
arr[i] = rand() % mod;
}
vector<int> vec(arr, arr+size);
return vec;
}
/* Unit test for findMedians */
void testFindMedians(){
const int SIZE = 36;
const int MOD = 20;
vector<int> vec = initialize(SIZE, MOD);
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
vector<int> medians = findMedians(vec, 0, SIZE-1);
cout << "The 5-sorted version: " << endl;
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
cout << "The medians extracted: " << endl;
for(int i = 0; i < medians.size(); i++){
cout << medians[i] << ", ";
}
cout << "\n\n" << endl;
}
/* Unit test for medianOfMedian */
void testMedianOfMedian(){
const int SIZE = 30;
const int MOD = 70;
vector<int> vec = initialize(SIZE, MOD);
cout << "Given array : " << endl;
for(int i = 0; i < SIZE; i++){
cout << vec[i] << ", ";
}
cout << "\n\n" << endl;
int median = medianOfMedian(vec, 0, vec.size()-1, vec.size()/2);
cout << "\n\nThe median is : " << median << endl;
cout << "As opposed to sorting and then showing the median... : " << endl;
std::sort(vec.begin(), vec.end());
cout << "sorted array : " << endl;
for(int i = 0; i < SIZE; i++){
if(i == SIZE/2)
cout << "**";
cout << vec[i] << ", ";
}
cout << "Median : " << vec[SIZE/2] << endl;
}
关于我得到的输出的额外部分
Given array :
7, 49, 23, 48, 20, 62, 44, 8, 43, 29, 20, 65, 42, 62, 7, 33, 37, 39, 60, 52, 53, 19, 29, 7, 50, 3, 69, 58, 56, 65,
After pivoting with the value 5 we get :
23, 29, 39, 42, 43,
After pivoting with the value 0 we get :
39,
Segmentation Fault: 11
在出现分段错误之前,一切似乎都还不错。我相信我的分区函数也能正常工作(是 leetcode 问题的实现之一)。
免责声明:这不是作业题,而是我在leetcode问题集中使用quickSelect后对算法的好奇
如果我提出的问题需要对 MVCE 进行更多详细说明,请告诉我,谢谢!
编辑:我发现我的代码中的递归分区方案是错误的。正如 Pradhan 所指出的那样 - 我不知何故有空向量导致开始和结束分别为 0 和 -1,导致我在调用它的无限循环中出现分段错误。仍在尝试弄清楚这部分。
MoM
总是 调用自身(以计算 pivot
),因此表现出 无限递归 。这违反了递归算法的"prime directive":在某些时候,问题"small"足以不需要递归调用。
正确实施
在 Scott 的提示的帮助下,我能够正确实现这个中位数算法。我修复了它并意识到我的主要想法是正确的,但有几个错误:
我的基本情况应该是大小为 <=5 的子向量。
关于最后一个数字(变量结束)在这种情况下是否应被视为包含在内或作为上限小于,存在一些细微差别。在下面的这个实现中,我让它的上限小于定义。
在下面。我也接受了 Scott 的回答 - 谢谢 Scott!
/* In case someone wants to pass in the pivValue, I broke partition into 2 pieces.
*/
int pivot(vector<int>& vec, int pivot, int start, int end){
/* Now we need to go into the array with a starting left and right value. */
int left = start, right = end-1;
while(left < right){
/* Increase the left and the right values until inappropriate value comes */
while(vec.at(left) < pivot && left <= right) left++;
while(vec.at(right) > pivot && right >= left) right--;
/* In case of duplicate values, we must take care of this special case. */
if(left >= right) break;
else if(vec.at(left) == vec.at(right)){ left++; continue; }
/* Do the normal swapping */
int temp = vec.at(left);
vec.at(left) = vec.at(right);
vec.at(right) = temp;
}
return right;
}
/* Returns the k-th element of this array. */
int MoM(vector<int>& vec, int k, int start, int end){
/* Start by base case: Sort if less than 10 size
* E.x.: Size = 9, 9 - 0 = 9.
*/
if(end-start < 10){
sort(vec.begin()+start, vec.begin()+end);
return vec.at(k);
}
vector<int> medians;
/* Now sort every consecutive 5 */
for(int i = start; i < end; i+=5){
if(end - i < 10){
sort(vec.begin()+i, vec.begin()+end);
medians.push_back(vec.at((i+end)/2));
}
else{
sort(vec.begin()+i, vec.begin()+i+5);
medians.push_back(vec.at(i+2));
}
}
int median = MoM(medians, medians.size()/2, 0, medians.size());
/* use the median to pivot around */
int piv = pivot(vec, median, start, end);
int length = piv - start+1;
if(k < length){
return MoM(vec, k, start, piv);
}
else if(k > length){
return MoM(vec, k-length, piv+1, end);
}
else
return vec[k];
}