将值分组到自定义容器中

Grouping values into custom bins

我有一个带有 'education' 属性的数据框。值是离散的,1-16。出于交叉制表的目的,我想对这个 'education' 变量进行分箱,但使用自定义分箱 (1:8, 9:11, 12, 13:15, 16)。

我一直在玩弄 pd.cut() 但是我得到了一个无效的语法错误

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])

尝试使 bin 落在阈值之间:

bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School', 
        'High School Grad', 'Some College', 'College Grad']

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
                                             bins=bins,
                                             labels=labels)

测试:

adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})

输出:

    education         education_bins
0           1  Middle School or less
1           2  Middle School or less
2           3  Middle School or less
3           4  Middle School or less
4           5  Middle School or less
5           6  Middle School or less
6           7  Middle School or less
7           8  Middle School or less
8           9       Some High School
9          10       Some High School
10         11       Some High School
11         12       High School Grad
12         13           Some College
13         14           Some College
14         15           Some College
15         16           College Grad