Pandas 中数据集中每行的动态 bin
Dynamic bin per row in a dataset in Pandas
我在动态合并我的数据集以进行进一步计算时遇到问题。我的目标是根据函数为数据框中的每一行指定 bin/labels,并将相应的标签分配给列 'action'。
我的数据集是:
id value1 value2 type length amount
1 0.9 1.0 X 10 ['A', 'B']
2 2.0 1.6 Y 80 ['A']
3 0.3 0.5 X 29 ['A', 'C']
函数如下:
def bin_label_generator(amount):
if amount< 2:
amount= 2
lower_bound = 1.0 - (1.0/amount)
mid_bound = 1.0
upper_bound = 1.0 + (1.0/amount)
thresholds = {
'bins':[-np.inf, lower_bound, mid_bound, upper_bound, np.inf],
'labels':[0, 1.0, 2.0, 3.0]
}
return thresholds
这是我当前的代码,但它需要我指定一行才能剪切。我希望使用行本身中指定的字典自动发生这种情况。
# filter on type
filter_type_series = df['type'].str.contains('X')
# get amount of items in amount list
amount_series = df[filter_type_series ]['amount'].str.len()
# generate bins for each row in series
bins_series = amount_series.apply(bin_label_generator)
# get the max values to for binning
max_values = df[filter_type_series].loc[:, [value1, value2]].abs().max(1)
# following line requires a row index, what I do not want
df['action'] = pd.cut(max_values, bins=bins_series[0]['bins'], labels=bins_series[0]['labels'])
我自己找到了一个修复方法,只需遍历系列中的每一行,然后将其添加到实际 df 中的列中。
type = 'X'
first_df = df.copy()
type_series = mst_df['type'].str.contains(type)
# loop over every row to dynamically use pd.cut with bins/labels from specific row
for index, row in mst_df[mst_series].iterrows():
# get the max value from rows
max_val = row[[value1, value2]].abs().max()
# get amount of cables
amount = len(row['amount'])
# get bins and labels for specific row
bins_label_dict = bin_label_generator(amount)
bins = bins_label_dict['bins']
labels = bins_label_dict['labels']
# append label to row with max value
first_df .loc[index, 'action'] = pd.cut([max_val], bins=bins, labels=labels)
我在动态合并我的数据集以进行进一步计算时遇到问题。我的目标是根据函数为数据框中的每一行指定 bin/labels,并将相应的标签分配给列 'action'。
我的数据集是:
id value1 value2 type length amount
1 0.9 1.0 X 10 ['A', 'B']
2 2.0 1.6 Y 80 ['A']
3 0.3 0.5 X 29 ['A', 'C']
函数如下:
def bin_label_generator(amount):
if amount< 2:
amount= 2
lower_bound = 1.0 - (1.0/amount)
mid_bound = 1.0
upper_bound = 1.0 + (1.0/amount)
thresholds = {
'bins':[-np.inf, lower_bound, mid_bound, upper_bound, np.inf],
'labels':[0, 1.0, 2.0, 3.0]
}
return thresholds
这是我当前的代码,但它需要我指定一行才能剪切。我希望使用行本身中指定的字典自动发生这种情况。
# filter on type
filter_type_series = df['type'].str.contains('X')
# get amount of items in amount list
amount_series = df[filter_type_series ]['amount'].str.len()
# generate bins for each row in series
bins_series = amount_series.apply(bin_label_generator)
# get the max values to for binning
max_values = df[filter_type_series].loc[:, [value1, value2]].abs().max(1)
# following line requires a row index, what I do not want
df['action'] = pd.cut(max_values, bins=bins_series[0]['bins'], labels=bins_series[0]['labels'])
我自己找到了一个修复方法,只需遍历系列中的每一行,然后将其添加到实际 df 中的列中。
type = 'X'
first_df = df.copy()
type_series = mst_df['type'].str.contains(type)
# loop over every row to dynamically use pd.cut with bins/labels from specific row
for index, row in mst_df[mst_series].iterrows():
# get the max value from rows
max_val = row[[value1, value2]].abs().max()
# get amount of cables
amount = len(row['amount'])
# get bins and labels for specific row
bins_label_dict = bin_label_generator(amount)
bins = bins_label_dict['bins']
labels = bins_label_dict['labels']
# append label to row with max value
first_df .loc[index, 'action'] = pd.cut([max_val], bins=bins, labels=labels)