如何用分类值替换列中的一系列数字,假设数字范围是 float 类型

How to replace a range of numbers in a column with a categorical value, given that range of numbers are of the type float

df['ratio_usage'] = np.where(df['ratio_usage'].between(0.9,0.1), 'Excellent', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.8,0.89), 'Very Good', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.7,0.79), 'Good', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.6,0.69), 'Fair', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.5,0.59), 'Satisfactory', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.4,0.49), 'Poor', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(0.3,0.0), 'Very Poor', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(1.01,2), 'Fatal', df['ratio_usage'])
df['ratio_usage'] = np.where(df['ratio_usage'].between(2.1,1000), 'Outliers', df['ratio_usage'])

它执行并替换了第一行代码,但会产生如下错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-269-7ad3204ddca1> in <module>()
      1 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.9,0.1), 'Excellent', df['ratio_usage'])
----> 2 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.8,0.89), 'Very Good', df['ratio_usage'])
      3 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.7,0.79), 'Good', df['ratio_usage'])
      4 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.6,0.69), 'Fair', df['ratio_usage'])
      5 df['ratio_usage'] = np.where(df['ratio_usage'].between(0.5,0.59), 'Satisfactory', df['ratio_usage'])

~\Anaconda\lib\site-packages\pandas\core\series.py in between(self, left, right, inclusive)
   3654         """
   3655         if inclusive:
-> 3656             lmask = self >= left
   3657             rmask = self <= right
   3658         else:

~\Anaconda\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
   1251 
   1252             with np.errstate(all='ignore'):
-> 1253                 res = na_op(values, other)
   1254             if is_scalar(res):
   1255                 raise TypeError('Could not compare {typ} type with Series'

~\Anaconda\lib\site-packages\pandas\core\ops.py in na_op(x, y)
   1138 
   1139         elif is_object_dtype(x.dtype):
-> 1140             result = _comp_method_OBJECT_ARRAY(op, x, y)
   1141 
   1142         elif is_datetimelike_v_numeric(x, y):

~\Anaconda\lib\site-packages\pandas\core\ops.py in _comp_method_OBJECT_ARRAY(op, x, y)
   1117         result = libops.vec_compare(x, y, op)
   1118     else:
-> 1119         result = libops.scalar_compare(x, y, op)
   1120     return result
   1121 

pandas\_libs\ops.pyx in pandas._libs.ops.scalar_compare()

TypeError: '>=' not supported between instances of 'str' and 'float'

这是一个使用 pd.cut 的解决方案,它被简化了,因为我看不到你的数据,也因为你有重叠的 bin,你需要协调。

设置

df = pd.DataFrame({'ratio_usage': [0.05, 0.8, 0.64, 0.59, 0.31]})

   ratio_usage
0         0.05
1         0.80
2         0.64
3         0.59
4         0.31

pd.cut 带有垃圾箱和标签

bins = [0.0, 0.2, 0.5, 0.7, 0.9, 1.0]    
labels = ["bad", "kinda bad", "average", "kinda good", "good"]    
pd.cut(df.ratio_usage, bins=bins, labels=labels)

0           bad
1    kinda good
2       average
3       average
4     kinda bad