用 pd.Cut 装箱超出范围（用 "<min_val" or ">Max_val" 替换 Nan）

Question

df= pd.DataFrame({'days': [0,31,45,35,19,70,80 ]})
df['range'] = pd.cut(df.days, [0,30,60])    
df

此处转载代码，其中 pd.cut 用于将数字列转换为分类列。 pd.cut 通常根据传递的列表给出类别 [0,30,60]。在此行的 0 、 5 和 6 中，分类为 Nan，超出了 [0,30,60]。我想要的是 0 should categorized as <0 和 70 should categorized as >60 以及类似的 80 should categorized as >60，如果可能的话 A,B,C,D,E 的动态文本标签取决于创建的类别。

Answer 1

对于第一部分，将 -np.inf 和 np.inf 添加到垃圾箱将确保所有东西都得到一个垃圾箱：

In [5]: df= pd.DataFrame({'days': [0,31,45,35,19,70,80]})
   ...: df['range'] = pd.cut(df.days, [-np.inf, 0, 30, 60, np.inf])
   ...: df
   ...:
Out[5]:
   days         range
0     0   (-inf, 0.0]
1    31  (30.0, 60.0]
2    45  (30.0, 60.0]
3    35  (30.0, 60.0]
4    19   (0.0, 30.0]
5    70   (60.0, inf]
6    80   (60.0, inf]

对于第二个，您可以使用 .cat.codes 获取 bin 索引并从那里做一些调整：

In [8]: df['range'].cat.codes.apply(lambda x: chr(x + ord('A')))
Out[8]:
0    A
1    C
2    C
3    C
4    B
5    D
6    D
dtype: object

用 pd.Cut 装箱超出范围（用 "<min_val" or ">Max_val" 替换 Nan）

Binning with pd.Cut Beyond range(replacing Nan with "<min_val" or ">Max_val" )

binning

python-3.x

pandas