将值分组到自定义容器中
Grouping values into custom bins
我有一个带有 'education' 属性的数据框。值是离散的,1-16。出于交叉制表的目的,我想对这个 'education' 变量进行分箱,但使用自定义分箱 (1:8, 9:11, 12, 13:15, 16)。
我一直在玩弄 pd.cut() 但是我得到了一个无效的语法错误
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])
尝试使 bin 落在阈值之间:
bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School',
'High School Grad', 'Some College', 'College Grad']
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
bins=bins,
labels=labels)
测试:
adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})
输出:
education education_bins
0 1 Middle School or less
1 2 Middle School or less
2 3 Middle School or less
3 4 Middle School or less
4 5 Middle School or less
5 6 Middle School or less
6 7 Middle School or less
7 8 Middle School or less
8 9 Some High School
9 10 Some High School
10 11 Some High School
11 12 High School Grad
12 13 Some College
13 14 Some College
14 15 Some College
15 16 College Grad
我有一个带有 'education' 属性的数据框。值是离散的,1-16。出于交叉制表的目的,我想对这个 'education' 变量进行分箱,但使用自定义分箱 (1:8, 9:11, 12, 13:15, 16)。
我一直在玩弄 pd.cut() 但是我得到了一个无效的语法错误
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])
尝试使 bin 落在阈值之间:
bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School',
'High School Grad', 'Some College', 'College Grad']
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
bins=bins,
labels=labels)
测试:
adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})
输出:
education education_bins
0 1 Middle School or less
1 2 Middle School or less
2 3 Middle School or less
3 4 Middle School or less
4 5 Middle School or less
5 6 Middle School or less
6 7 Middle School or less
7 8 Middle School or less
8 9 Some High School
9 10 Some High School
10 11 Some High School
11 12 High School Grad
12 13 Some College
13 14 Some College
14 15 Some College
15 16 College Grad