Python:通过向上舍入将列表中的 # 个值分配给 bin
Python: Assigning # values in a list to bins, by rounding up
我想要一个可以接受一系列和一组 bin 的函数,并且基本上四舍五入到最近的 bin。例如:
my_series = [ 1, 1.5, 2, 2.3, 2.6, 3]
def my_function(my_series, bins):
...
my_function(my_series, bins=[1,2,3])
> [1,2,2,3,3,3]
这似乎与 Numpy's Digitize 的意图非常接近,但它产生了错误的值(错误值的星号):
np.digitize(my_series, bins= [1,2,3], right=False)
> [1, 1*, 2, 2*, 2*, 3]
错误的原因从文档中很清楚:
Each index i returned is such that bins[i-1] <= x < bins[i] if bins is
monotonically increasing, or bins[i-1] > x >= bins[i] if bins is
monotonically decreasing. If values in x are beyond the bounds of
bins, 0 or len(bins) is returned as appropriate. If right is True,
then the right bin is closed so that the index i is such that
bins[i-1] < x <= bins[i] or bins[i-1] >= x > bins[i]`` if bins is
monotonically increasing or decreasing, respectively.
如果我输入递减的值并将 "right" 设置为 True,我可以更接近我想要的...
np.digitize(my_series, bins= [3,2,1], right=True)
> [3, 2, 2, 1, 1, 1]
但随后我将不得不想出一种方法,基本上有条不紊地颠倒最低编号分配 (1) 和最高编号分配 (3)。当只有 3 个 bin 时很简单,但是当 bin 的数量变长时会变得更复杂。必须有更优雅的方式来完成这一切。
我相信np.searchsorted
会如你所愿:
Find the indices into a sorted array a
such that, if the corresponding
elements in v
were inserted before the indices, the order of a would
be preserved.
In [1]: my_series = [1, 1.5, 2, 2.3, 2.6, 3]
In [2]: bins = [1,2,3]
In [3]: import numpy as np
In [4]: [bins[k] for k in np.searchsorted(bins, my_series)]
Out[4]: [1, 2, 2, 3, 3, 3]
(从 numpy 1.10.0 开始,digitize
是根据 searchsorted
实现的。)
另一种方式是:
In [25]: def find_nearest(array,value):
...: idx = (np.abs(array-np.ceil(value))).argmin()
...: return array[idx]
...:
In [26]: my_series = np.array([ 1, 1.5, 2, 2.3, 2.6, 3])
In [27]: bins = [1, 2, 3]
In [28]: [find_nearest(bins, x) for x in my_series]
Out[28]: [1, 2, 2, 3, 3, 3]
我们可以简单地使用 np.digitize
并将其 right
选项设置为 True
来获取索引,然后从 bins
中提取相应的元素,引入 np.take
, 像这样 -
np.take(bins,np.digitize(a,bins,right=True))
我想要一个可以接受一系列和一组 bin 的函数,并且基本上四舍五入到最近的 bin。例如:
my_series = [ 1, 1.5, 2, 2.3, 2.6, 3]
def my_function(my_series, bins):
...
my_function(my_series, bins=[1,2,3])
> [1,2,2,3,3,3]
这似乎与 Numpy's Digitize 的意图非常接近,但它产生了错误的值(错误值的星号):
np.digitize(my_series, bins= [1,2,3], right=False)
> [1, 1*, 2, 2*, 2*, 3]
错误的原因从文档中很清楚:
Each index i returned is such that bins[i-1] <= x < bins[i] if bins is monotonically increasing, or bins[i-1] > x >= bins[i] if bins is monotonically decreasing. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate. If right is True, then the right bin is closed so that the index i is such that bins[i-1] < x <= bins[i] or bins[i-1] >= x > bins[i]`` if bins is monotonically increasing or decreasing, respectively.
如果我输入递减的值并将 "right" 设置为 True,我可以更接近我想要的...
np.digitize(my_series, bins= [3,2,1], right=True)
> [3, 2, 2, 1, 1, 1]
但随后我将不得不想出一种方法,基本上有条不紊地颠倒最低编号分配 (1) 和最高编号分配 (3)。当只有 3 个 bin 时很简单,但是当 bin 的数量变长时会变得更复杂。必须有更优雅的方式来完成这一切。
我相信np.searchsorted
会如你所愿:
Find the indices into a sorted array
a
such that, if the corresponding elements inv
were inserted before the indices, the order of a would be preserved.
In [1]: my_series = [1, 1.5, 2, 2.3, 2.6, 3]
In [2]: bins = [1,2,3]
In [3]: import numpy as np
In [4]: [bins[k] for k in np.searchsorted(bins, my_series)]
Out[4]: [1, 2, 2, 3, 3, 3]
(从 numpy 1.10.0 开始,digitize
是根据 searchsorted
实现的。)
另一种方式是:
In [25]: def find_nearest(array,value):
...: idx = (np.abs(array-np.ceil(value))).argmin()
...: return array[idx]
...:
In [26]: my_series = np.array([ 1, 1.5, 2, 2.3, 2.6, 3])
In [27]: bins = [1, 2, 3]
In [28]: [find_nearest(bins, x) for x in my_series]
Out[28]: [1, 2, 2, 3, 3, 3]
我们可以简单地使用 np.digitize
并将其 right
选项设置为 True
来获取索引,然后从 bins
中提取相应的元素,引入 np.take
, 像这样 -
np.take(bins,np.digitize(a,bins,right=True))