将值数字化到 "floor" bin python
Digitizing value to "floor" bin python
我需要将一些值数字化,以便返回的索引是 "floor" 或 "ceiling" bin。
例如,对于 bins = numpy.array([0.0, 0.5, 1.0, 1.5, 2.0])
和值 0.2
,我希望索引为 0
,对于值 0.26
,返回的索引应为 1
,
等等。
我有以下丑陋的函数来做我想做的事:
import numpy
def get_bin_index(value, bins):
bin_diff = bins[1]-bins[0]
index = numpy.digitize(value, bins)
if bins[index] - value > bin_diff/2.0:
index -= 1
return index
有什么巧妙的(阅读 better/efficient)方法可以做到这一点吗?
编辑:包括时间值(满足我的好奇心!)
In [1]: def get_bin_index(value, bins):
...: bin_diff = bins[1]-bins[0]
...: index = numpy.digitize(value, bins)
...: if bins[index] - value > bin_diff/2.0:
...: index -= 1
...: return index
...:
In [2]: def get_bin_index_c(value, bins):
...: return numpy.rint((value-bins[0])/(bins[1]-bins[0]))
...:
In [3]: def get_bin_index_mid_digitized(value, bins):
...: return numpy.digitize(0.6, (bins[1:] + bins[:-1])/2.0)
...:
In [4]: bin_halfs = numpy.array([0.0, 0.5, 1.0, 1.5, 2.0])
In [5]: %timeit get_bin_index(0.9, bin_halfs)
The slowest run took 5.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 4.93 µs per loop
In [6]: %timeit get_bin_index_c(0.9, bin_halfs)
The slowest run took 14.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.34 µs per loop
In [7]: %timeit get_bin_index_mid_digitized(0.9, bin_halfs)
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.37 µs per loop
您可以简单地获取垃圾箱的中间并与 np.digitize
-
一起使用
np.digitize(value, (bins[1:] + bins[:-1])/2.0)
如果 bin_diffs 都相同,您可以通过以下方式在常数时间内完成此操作:
def get_bin_index2(value, bins):
return numpy.rint((value - bins[0])/(bins[1]-bins[0]))
我需要将一些值数字化,以便返回的索引是 "floor" 或 "ceiling" bin。
例如,对于 bins = numpy.array([0.0, 0.5, 1.0, 1.5, 2.0])
和值 0.2
,我希望索引为 0
,对于值 0.26
,返回的索引应为 1
,
等等。
我有以下丑陋的函数来做我想做的事:
import numpy
def get_bin_index(value, bins):
bin_diff = bins[1]-bins[0]
index = numpy.digitize(value, bins)
if bins[index] - value > bin_diff/2.0:
index -= 1
return index
有什么巧妙的(阅读 better/efficient)方法可以做到这一点吗?
编辑:包括时间值(满足我的好奇心!)
In [1]: def get_bin_index(value, bins):
...: bin_diff = bins[1]-bins[0]
...: index = numpy.digitize(value, bins)
...: if bins[index] - value > bin_diff/2.0:
...: index -= 1
...: return index
...:
In [2]: def get_bin_index_c(value, bins):
...: return numpy.rint((value-bins[0])/(bins[1]-bins[0]))
...:
In [3]: def get_bin_index_mid_digitized(value, bins):
...: return numpy.digitize(0.6, (bins[1:] + bins[:-1])/2.0)
...:
In [4]: bin_halfs = numpy.array([0.0, 0.5, 1.0, 1.5, 2.0])
In [5]: %timeit get_bin_index(0.9, bin_halfs)
The slowest run took 5.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 4.93 µs per loop
In [6]: %timeit get_bin_index_c(0.9, bin_halfs)
The slowest run took 14.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.34 µs per loop
In [7]: %timeit get_bin_index_mid_digitized(0.9, bin_halfs)
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.37 µs per loop
您可以简单地获取垃圾箱的中间并与 np.digitize
-
np.digitize(value, (bins[1:] + bins[:-1])/2.0)
如果 bin_diffs 都相同,您可以通过以下方式在常数时间内完成此操作:
def get_bin_index2(value, bins):
return numpy.rint((value - bins[0])/(bins[1]-bins[0]))