Cutting continous variable into catagories ( ValueError: Bin labels must be one fewer than the number of bin edges)

Question

我有一个数据列，我想将其分成离散的 bin。我的 min 是 1 而 max 是 70

df.total_value.describe()
count       37926.000000
mean        12.368138
std          7.385642
min          1.000000
25%          8.000000
50%         10.000000
75%         16.000000
max         70.000000
Name: total_value, dtype: float64

我试过了

labels = ["{0} - {1}".format(i, i + 1) for i in range(1, 70, 1)]

cut_bins = range(1, 70)
df['total_value_bins'] = pd.cut(df['total_value'], bins= cut_bins, labels=labels)

我收到这个错误

ValueError: Bin labels must be one fewer than the number of bin edges

如果我使用

，我就能拿到垃圾箱

df['total_value_bins'] = pd.cut(df['total_value'], bins= cut_bins)

但我想要格式很好 e.g. 1-2

如有任何建议，我们将不胜感激。

提前致谢。

Answer 1

如错误所述，您需要 len(cut_bins) = len(labels)+1 而现在它们的长度相同。此外，为了能够将值 1 和 70 合并，您需要将 cut_bins 的 range 中的上限更改为 71（因为上限未在 range 中创建), 并在 cut

中使用参数 include_lowest

labels = ["{0} - {1}".format(i, i + 1) for i in range(1, 70, 1)]

cut_bins = range(1, 71) # here goes to 71

# dummy data
s = pd.Series([1,4,45,70])

print(pd.cut(s, bins= cut_bins, labels=labels, include_lowest=True))
0      1 - 2
1      3 - 4
2    44 - 45
3    69 - 70
dtype: category
Categories (69, object): ['1 - 2' < '2 - 3' < '3 - 4' < '4 - 5' ... '66 - 67' < '67 - 68' < '68 - 69' < '69 - 70']

Cutting continous variable into catagories ( ValueError: Bin labels must be one fewer than the number of bin edges)

Cutting continous variable into catagories ( ValueError: Bin labels must be one fewer than the number of bin edges)

python

cut

pandas

data-wrangling