如何对相同值的范围进行二分查找?
How to do a binary search for a range of the same value?
我有一个排序的数字列表,我需要得到它 return 数字出现的索引范围。我的名单是:
daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]
如果我搜索了0,我需要return (0, 3)。现在我只能得到它来找到一个号码的位置!我知道如何进行二进制搜索,但我不知道如何让它从该位置上下移动以找到其他相同的值!
low = 0
high = len(daysSick) - 1
while low <= high :
mid = (low + high) // 2
if value < daysSick[mid]:
high = mid - 1
elif value > list[mid]:
low = mid + 1
else:
return mid
你为什么不用python's bisection routines:
>>> daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]
>>> from bisect import bisect_left, bisect_right
>>> bisect_left(daysSick, 3)
6
>>> bisect_right(daysSick, 3)
9
>>> daysSick[6:9]
[3, 3, 3]
我提出的解决方案比 bisect
库
中的 raw functions taken 更快
解决方案
使用优化二进制搜索
def search(a, x):
right = 0
h = len(a)
while right < h:
m = (right+h)//2
if x < a[m]: h = m
else:
right = m+1
# start binary search for left element only
# including elements from 0 to right-1 - much faster!
left = 0
h = right - 1
while left < h:
m = (left+h)//2
if x > a[m]: left = m+1
else:
h = m
return left, right-1
search(daysSick, 5)
(10, 12)
search(daysSick, 2)
(5, 5)
对比 Bisect
使用自定义二分搜索...
%timeit search(daysSick, 3)
1000000 loops, best of 3: 1.23 µs per loop
正在将源代码从 bisect
复制到 python...
%timeit bisect_left(daysSick, 1), bisect_right(daysSick, 1)
1000000 loops, best of 3: 1.77 µs per loop
使用默认导入是迄今为止最快的,因为我认为它可能在幕后进行了优化...
from bisect import bisect_left, bisect_right
%timeit bisect_left(daysSick, 1), bisect_right(daysSick, 1)
1000000 loops, best of 3: 504 ns per loop
额外
没有分机。图书馆但不是二进制搜索
daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]
# using a function
idxL = lambda val, lst: [i for i,d in enumerate(lst) if d==val]
allVals = idxL(0,daysSick)
(0, 3)
好的,这是另一种工作方式,它先尝试缩小范围,然后再对已缩小范围的一半执行 bisect_left
和 bisect_right
。我写这段代码是因为我认为它 比调用 bisect_left
和 bisect_right
稍微 更有效,即使它具有相同的时间复杂度。
def binary_range_search(s, x):
# First we will reduce the low..high range if possible
# by using symmetric binary search to find an index pointing to x
low, high = 0, len(s)
while True:
if low >= high:
return None
mid = (low + high) // 2
mid_element = s[mid]
if x == mid_element:
break
elif x < mid_element:
high = mid
else:
low = mid + 1
xindex = mid
# Now we have found an index pointing to x called xindex
# and potentially reduced the low..high range
# now we can run bisect_left on low..xindex + 1
lo, hi = low, xindex + 1
while lo < hi:
mid = (lo+hi)//2
if x > s[mid]: lo = mid+1
else: hi = mid
first = lo
# and also bisect_right on xindex..high
lo, hi = xindex, high
while lo < hi:
mid = (lo+hi)//2
if x < s[mid]: hi = mid
else: lo = mid+1
last = lo - 1
return first, last
我认为时间复杂度是 O(log n) 就像简单的解决方案一样,但我相信无论如何这会更有效一些。我认为值得注意的是,您执行 bisect_left
和 bisect_right
的第二部分可以针对大型数据集并行化,因为它们是不交互的独立操作。
- 查找楼层:编号 < 键的索引
- 查找上限:数字索引 > 键
- [floor + 1 , ceiling - 1] 给你范围。
# yields next highest or lowest number to the key
# isLessThan determines which way the pointer moves
def nextNumber(arr, key, isLessThan):
lo, hi = 0, len(arr)-1
while lo <= hi:
mid = lo + (hi - lo) // 2
if isLessThan(key, arr[mid]):
hi = mid - 1
else:
lo = mid + 1
return (lo, hi)
def ceiling(arr, key):
lo,_ = nextNumber(arr, key, lambda x,y : x < y)
return lo
def floor(arr, key):
_,hi = nextNumber(arr, key, lambda x,y : x <= y)
return hi
def find_range(arr, key):
fl = floor(arr,key)
# key not in array
if fl+1 >= len(arr) or arr[fl+1] != key:
return [-1,-1]
cl = ceiling(arr,key)
return [fl+1 , cl-1]
我有一个排序的数字列表,我需要得到它 return 数字出现的索引范围。我的名单是:
daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]
如果我搜索了0,我需要return (0, 3)。现在我只能得到它来找到一个号码的位置!我知道如何进行二进制搜索,但我不知道如何让它从该位置上下移动以找到其他相同的值!
low = 0
high = len(daysSick) - 1
while low <= high :
mid = (low + high) // 2
if value < daysSick[mid]:
high = mid - 1
elif value > list[mid]:
low = mid + 1
else:
return mid
你为什么不用python's bisection routines:
>>> daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]
>>> from bisect import bisect_left, bisect_right
>>> bisect_left(daysSick, 3)
6
>>> bisect_right(daysSick, 3)
9
>>> daysSick[6:9]
[3, 3, 3]
我提出的解决方案比 bisect
库
解决方案
使用优化二进制搜索
def search(a, x):
right = 0
h = len(a)
while right < h:
m = (right+h)//2
if x < a[m]: h = m
else:
right = m+1
# start binary search for left element only
# including elements from 0 to right-1 - much faster!
left = 0
h = right - 1
while left < h:
m = (left+h)//2
if x > a[m]: left = m+1
else:
h = m
return left, right-1
search(daysSick, 5)
(10, 12)
search(daysSick, 2)
(5, 5)
对比 Bisect
使用自定义二分搜索...
%timeit search(daysSick, 3) 1000000 loops, best of 3: 1.23 µs per loop
正在将源代码从
bisect
复制到 python...%timeit bisect_left(daysSick, 1), bisect_right(daysSick, 1) 1000000 loops, best of 3: 1.77 µs per loop
使用默认导入是迄今为止最快的,因为我认为它可能在幕后进行了优化...
from bisect import bisect_left, bisect_right %timeit bisect_left(daysSick, 1), bisect_right(daysSick, 1) 1000000 loops, best of 3: 504 ns per loop
额外
没有分机。图书馆但不是二进制搜索
daysSick = [0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 11, 15, 24]
# using a function
idxL = lambda val, lst: [i for i,d in enumerate(lst) if d==val]
allVals = idxL(0,daysSick)
(0, 3)
好的,这是另一种工作方式,它先尝试缩小范围,然后再对已缩小范围的一半执行 bisect_left
和 bisect_right
。我写这段代码是因为我认为它 比调用 bisect_left
和 bisect_right
稍微 更有效,即使它具有相同的时间复杂度。
def binary_range_search(s, x):
# First we will reduce the low..high range if possible
# by using symmetric binary search to find an index pointing to x
low, high = 0, len(s)
while True:
if low >= high:
return None
mid = (low + high) // 2
mid_element = s[mid]
if x == mid_element:
break
elif x < mid_element:
high = mid
else:
low = mid + 1
xindex = mid
# Now we have found an index pointing to x called xindex
# and potentially reduced the low..high range
# now we can run bisect_left on low..xindex + 1
lo, hi = low, xindex + 1
while lo < hi:
mid = (lo+hi)//2
if x > s[mid]: lo = mid+1
else: hi = mid
first = lo
# and also bisect_right on xindex..high
lo, hi = xindex, high
while lo < hi:
mid = (lo+hi)//2
if x < s[mid]: hi = mid
else: lo = mid+1
last = lo - 1
return first, last
我认为时间复杂度是 O(log n) 就像简单的解决方案一样,但我相信无论如何这会更有效一些。我认为值得注意的是,您执行 bisect_left
和 bisect_right
的第二部分可以针对大型数据集并行化,因为它们是不交互的独立操作。
- 查找楼层:编号 < 键的索引
- 查找上限:数字索引 > 键
- [floor + 1 , ceiling - 1] 给你范围。
# yields next highest or lowest number to the key
# isLessThan determines which way the pointer moves
def nextNumber(arr, key, isLessThan):
lo, hi = 0, len(arr)-1
while lo <= hi:
mid = lo + (hi - lo) // 2
if isLessThan(key, arr[mid]):
hi = mid - 1
else:
lo = mid + 1
return (lo, hi)
def ceiling(arr, key):
lo,_ = nextNumber(arr, key, lambda x,y : x < y)
return lo
def floor(arr, key):
_,hi = nextNumber(arr, key, lambda x,y : x <= y)
return hi
def find_range(arr, key):
fl = floor(arr,key)
# key not in array
if fl+1 >= len(arr) or arr[fl+1] != key:
return [-1,-1]
cl = ceiling(arr,key)
return [fl+1 , cl-1]