Pandas 中数据集中每行的动态 bin

Question

我在动态合并我的数据集以进行进一步计算时遇到问题。我的目标是根据函数为数据框中的每一行指定 bin/labels，并将相应的标签分配给列 'action'。

我的数据集是：

id  value1 value2 type  length  amount
1   0.9     1.0     X   10      ['A', 'B']
2   2.0     1.6     Y   80      ['A']
3   0.3     0.5     X   29      ['A', 'C']

函数如下：

    def bin_label_generator(amount):
        if amount< 2:
            amount= 2
        lower_bound = 1.0 - (1.0/amount) 
        mid_bound = 1.0
        upper_bound = 1.0 + (1.0/amount)
        thresholds = {
            'bins':[-np.inf, lower_bound, mid_bound, upper_bound, np.inf],
            'labels':[0, 1.0, 2.0, 3.0]
        }
        return thresholds

这是我当前的代码，但它需要我指定一行才能剪切。我希望使用行本身中指定的字典自动发生这种情况。

# filter on type
filter_type_series = df['type'].str.contains('X')

# get amount of items in amount list
amount_series = df[filter_type_series ]['amount'].str.len()

# generate bins for each row in series
bins_series = amount_series.apply(bin_label_generator)

# get the max values to for binning
max_values = df[filter_type_series].loc[:, [value1, value2]].abs().max(1)

# following line requires a row index, what I do not want
df['action'] = pd.cut(max_values, bins=bins_series[0]['bins'], labels=bins_series[0]['labels'])

Answer 1

我自己找到了一个修复方法，只需遍历系列中的每一行，然后将其添加到实际 df 中的列中。

type = 'X'

first_df = df.copy()
type_series = mst_df['type'].str.contains(type)

# loop over every row to dynamically use pd.cut with bins/labels from specific row
for index, row in mst_df[mst_series].iterrows():
#     get the max value from rows
    max_val = row[[value1, value2]].abs().max()
    
#     get amount of cables
    amount = len(row['amount'])
    
#   get bins and labels for specific row
    bins_label_dict = bin_label_generator(amount)
    bins = bins_label_dict['bins']
    labels = bins_label_dict['labels']
    
#     append label to row with max value
    first_df .loc[index, 'action'] = pd.cut([max_val], bins=bins, labels=labels)

Pandas 中数据集中每行的动态 bin

Dynamic bin per row in a dataset in Pandas

python

binning

pandas