为什么应该使用 mid-value 而不是 mid-value - 1 来进行二分查找的递归实现？

Question

背景

一本交互式书籍将此函数作为二分查找的示例介绍。

void GuessNumber(int lowVal, int highVal) {
   int midVal;            // Midpoint of low and high value
   char userAnswer;       // User response
   
   midVal = (highVal + lowVal) / 2;
   
   // Prompt user for input
   cout << "Is it " << midVal << "? (l/h/y): ";
   cin >> userAnswer;
   
   if( (userAnswer != 'l') && (userAnswer != 'h') ) { // Base case: found number
      cout << "Thank you!" << endl;                   
   }
   else {                                             // Recursive case: split into lower OR upper half
      if (userAnswer == 'l') {                        // Guess in lower half
         GuessNumber(lowVal, midVal);                 // Recursive call
      }
      else {                                          // Guess in upper half
         GuessNumber(midVal + 1, highVal);            // Recursive call
      }
   }
}

给出了算法，然后他们解释了如何计算递归调用的中间值。

Because midVal has already been checked, it need not be part of the new window, so midVal + 1 rather than midVal is used for the window's new low side, or midVal - 1 for the window's new high side. But the midVal - 1 can have the drawback of a non-intuitive base case (i.e., midVal < lowVal, because if the current window is say 4..5, midVal is 4, so the new window would be 4..4-1, or 4..3). rangeSize == 1 is likely more intuitive, and thus the algorithm uses midVal rather than midVal - 1. However, the algorithm uses midVal + 1 when searching higher, due to integer rounding. In particular, for window 99..100, midVal is 99 ((99 + 100) / 2 = 99.5, rounded to 99 due to truncation of the fraction in integer division). So the next window would again be 99..100, and the algorithm would repeat with this window forever. midVal + 1 prevents the problem, and doesn't miss any numbers because midVal was checked and thus need not be part of the window.

推理

我明白为什么，当使用中值作为下限时，函数被调用为GuessNumber(midVal + 1, highVal)。极限 99 和 100 给出的解释非常清楚。但是，我不明白为什么当使用中值作为上限时，函数被调用为GuessNumber(lowVal, midVal)而不是GuessNumber(lowVal, midVal - 1)。

当正在搜索的值不在范围内时，该算法缺少大小写。但是，他们似乎确实做出了这样的假设（作为前提条件）。因此，他们给出的4和5的例子意义不大。

测试用例：正在搜索的数字是4

假设搜索的值为 4。

mid_value := (4+5) / 2 = 9 / 2 = 4.5 = 4 (due truncation)

检查数字时，应该是return4的位置，才不会出错。调用 GuessNumber(4, mid_value - 1) 永远不会被调用。这意味着 midVal < lowVal 的情况永远不会发生。

测试用例：正在搜索的数字是4

现在，假设值为5。进行相同的计算。比较时，算法将执行调用 GuessNumber(mid_value + 1, 5)。这应该是 return 5 的位置。同样，不会调用 GuessNumber(5, mid_value - 1)。

测试用例：改变范围

如果我尝试增加范围，假设使用 4 和 7 作为限制，如果像 GuessNumber(low_value, mid_value - 1) 那样调用，该函数将永远不会导致 midVal < lowVal。考虑 4 和 7 之间范围的中间值，即 5（应有截断）。如果正在搜索的数字是 5，则该位置立即被 returned。但是，如果要搜索的数字是 4 并且递归调用为 GuessNumber(low_value, mid_value - 1) (GuessNumber(4, 5 - 1))，则新的中间值将为 4，并且不会出现 midVal < lowVal 的情况。 4的位置是returned.

一些结论

我认为这可能是一个逻辑错误。发生这种情况的唯一方法是搜索的数字超出范围（特别是低于下限），但算法不会测试搜索的数字超出范围的情况。同样，这似乎是一个先决条件。尽管如此，给出的解释引起了我的注意。他们花时间说错误 midVal < lowVal 可能发生，他们给出了范围 4 和 5 的例子。

其他发现

我在离散数学书上查了伪代码，他们用的是recursive_binary_search(lowVal, midVal - 1)的情况，不用担心上述问题。不过，我注意到他们会检查该值是否超出范围。

procedure binary_search(i, j, x: integer, 1 ≤ i ≤ j ≤ n)
m := ⎣(i + j)/2⎦
if x = am then
    return m
else if (x < am and i < m) then
    return binary_search(i, m-1, x)
else if (x > am and j > m) then
    return binary_search(m + 1, j, x)
else return 0
{output is location of x in a1, a2, ..., an if it appears; otherwise it is 0}

我在另一本数据结构书中也看到了这个实现。这并不以被搜索的项目在范围内为前提，但他们确实检查了这一点，并且他们仍然调用具有限制 lower （本例中的 first ）和 mid - 1（本例中为loc - 1）。

void recBinarySearch(ArrayType a, int first, int last, ElementType item, bool &found, int &loc) {
    /*---------------------------
      Recursively search sub(list) a[first], ..., a[last] for item using binary search.

      Precondition: Elements of a are in ascending order; item has the same type as the array elements.
      Postcondition: found = true and loc = position of item if the search is successful; otherwise, found is false.
    -----------------------------*/

    if (first > last)
        found = false;
    else
    {
        loc = (first + last) / 2;
        if (item < a[loc])       // First half
            recBinarySearch(a, first, loc - 1, found, loc);
        else if (item > a[loc])  // Second half
            recBinarySearch(a, loc + 1, last, found, loc);
        else
            found = true;
    }
}

问题

我搜索了 Google 和其他 Whosebug 问题，但我找不到指向正确方向的内容（大多数结果都解释了 overflow issue in the mid-value calculation，这不是这里的问题).书中关于使用mid-value而不是mid-value - 1作为上限的解释是否正确？有没有可以证明这一点的例子，或者我错过了什么？

提前感谢您的宝贵时间和帮助！

Answer 1

您对这个例子感到困惑是对的。如果范围是 4..5，则猜测 (midVal) 将是 4。执行代码行 GuessNumber(lowVal, midVal-1); 的唯一方法是用户回答“低”，即：

谎言，或者
他们的号码超出范围。

示例代码不考虑初始输入范围之外的搜索值，而二进制搜索应该这样做。

为什么应该使用 mid-value 而不是 mid-value - 1 来进行二分查找的递归实现？

Why should mid-value be used instead of mid-value - 1 for a recursive implementation of a binary search?

c++

algorithm

logic

背景

推理

测试用例：正在搜索的数字是4

测试用例：正在搜索的数字是4

测试用例：改变范围

一些结论

其他发现

问题