Python:定义单独的binning

Python: define individual binning

我正在尝试定义我自己的分箱并计算这些分箱上我的数据框的其他一些列的平均值。不幸的是,它只适用于整数输入,如下所示。在这种特殊情况下,“step_size”定义了一个 bin 的步长,我想使用像 0.109 这样的浮点值,它对应于 0.109 秒。你知道我该怎么做吗?我认为问题出在“create_bins”的定义中,但我无法修复它... 目标应该是得到这个:[(0,0.109),(0.109,0,218),(0.218,0.327) ......]

问候

# =============================================================================
# Define parameters
# =============================================================================
seconds_min = 0 
seconds_max = 9000 
step_size = 1 
bin_number = int((seconds_max-seconds_min)/step_size)


# =============================================================================
# Define function to create your own individual binning

# lower_bound defines the lowest value of the binning interval
# width defines the width of the binning interval
# quantity defines the number of bins
# =============================================================================
def create_bins(lower_bound, width, quantity):
    bins = []
    for low in range(lower_bound, 
                      lower_bound + quantity * width + 1, width):
        bins.append((low, low+width))
    return bins


# =============================================================================
# Create binning list
# =============================================================================
bin_list = create_bins(lower_bound=seconds_min,
                    width=step_size,
                    quantity=bin_number)

print(bin_list)

问题在于 range 函数不允许浮点范围。

您可以使用 more_itertools 中的 numeric_range 功能:

from more_itertools import numeric_range

seconds_min = 0
seconds_max = 9
step_size = 0.109
bin_number = int((seconds_max-seconds_min)/step_size)
   
   

def create_bins(lower_bound, width, quantity):
    bins = []
    for low in numeric_range(lower_bound,
                      lower_bound + quantity * width + 1, width):
        bins.append((low, low+width))
    return bins
   
bin_list = create_bins(lower_bound=seconds_min,
                       width=step_size,
                       quantity=bin_number)
   
    
print(bin_list)
# (0.0, 0.109), (0.109, 0.218), (0.218, 0.327) ... ]

这是使用 zip 和 numpy 的 arange 的简单方法。我把上限设置为5,当然你也可以选择其他数字。

list(zip(np.arange(0, 5, .109), np.arange(.109, 5, .109)))

结果是:

[(0.0, 0.109),
 (0.109, 0.218),
 (0.218, 0.327),
 (0.327, 0.436),
 (0.436, 0.545),
 (0.545, 0.654),
 (0.654, 0.763),
 ... 

没有 numpy:

max_bin=100
min_bin=0
step_size=0.109

number_of_bins = int(1+((max_bin-min_bin)/step_size)) # +1 to cover the whole interval

bins= []
for a in range(number_of_bins):
    bins.append((a*step_size, (a+1)*step_size))