Scikit-learn LabelEncoder: IndexError: arrays used as indices must be of integer (or boolean) type
Scikit-learn LabelEncoder: IndexError: arrays used as indices must be of integer (or boolean) type
我正在尝试预处理成人数据以进行分类。我使用 scikit-learn 处理分类属性。
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X[:,0] = labelencoder.fit_transform(X[:,0])
labelencoder.classes_
输出:
array(['Federal-gov', 'Local-gov', 'Private', 'Self-emp-inc',
'Self-emp-not-inc', 'State-gov', 'Without-pay'], dtype=object)
新内容:
X[:3]
array([[5, 'Bachelors', 'Under-Graduate', 'Never-married',
'Adm-clerical', 'Not-in-family', 'White', 'Male',
'United-States', 39.0, 77516.0, 13.0, 2174.0, 0.0, 40.0],
[4, 'Bachelors', 'Under-Graduate', 'Married-civ-spouse',
'Exec-managerial', 'Husband', 'White', 'Male', 'United-States',
50.0, 83311.0, 13.0, 0.0, 0.0, 13.0],
[2, 'HS-grad', 'HS-grad', 'Divorced', 'Handlers-cleaners',
'Not-in-family', 'White', 'Male', 'United-States', 38.0,
215646.0, 9.0, 0.0, 0.0, 40.0]], dtype=object)
到这里一切都很好。但我需要查看原始属性并尝试返回以下内容:
original = labelencoder.inverse_transform(X[:,0])
我收到这个错误:
IndexError Traceback (most recent call last)
<ipython-input-78-f8cf404b255a> in <module>
----> 1 original = labelencoder.inverse_transform(X[:,0])
D:\Anaconda\lib\site-packages\sklearn\preprocessing\label.py in inverse_transform(self, y)
281 "y contains previously unseen labels: %s" % str(diff))
282 y = np.asarray(y)
--> 283 return self.classes_[y]
284
285
IndexError: arrays used as indices must be of integer (or boolean) type
错误来自于您的数组具有 "object" 类型 。即使您提取第一列,类型仍然是 "object"(检查 X[:,0].dtype
)。此外 inverse_transform
需要 int 类型。因此,为了使用 inverse_transform
,您需要像这样将 vector 转换为 int:
original = labelencoder.inverse_transform(X[:,0].astype(int))
输出:
array(['a', 'b', 'c'], dtype=object)
我正在尝试预处理成人数据以进行分类。我使用 scikit-learn 处理分类属性。
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X[:,0] = labelencoder.fit_transform(X[:,0])
labelencoder.classes_
输出:
array(['Federal-gov', 'Local-gov', 'Private', 'Self-emp-inc',
'Self-emp-not-inc', 'State-gov', 'Without-pay'], dtype=object)
新内容:
X[:3]
array([[5, 'Bachelors', 'Under-Graduate', 'Never-married',
'Adm-clerical', 'Not-in-family', 'White', 'Male',
'United-States', 39.0, 77516.0, 13.0, 2174.0, 0.0, 40.0],
[4, 'Bachelors', 'Under-Graduate', 'Married-civ-spouse',
'Exec-managerial', 'Husband', 'White', 'Male', 'United-States',
50.0, 83311.0, 13.0, 0.0, 0.0, 13.0],
[2, 'HS-grad', 'HS-grad', 'Divorced', 'Handlers-cleaners',
'Not-in-family', 'White', 'Male', 'United-States', 38.0,
215646.0, 9.0, 0.0, 0.0, 40.0]], dtype=object)
到这里一切都很好。但我需要查看原始属性并尝试返回以下内容:
original = labelencoder.inverse_transform(X[:,0])
我收到这个错误:
IndexError Traceback (most recent call last)
<ipython-input-78-f8cf404b255a> in <module>
----> 1 original = labelencoder.inverse_transform(X[:,0])
D:\Anaconda\lib\site-packages\sklearn\preprocessing\label.py in inverse_transform(self, y)
281 "y contains previously unseen labels: %s" % str(diff))
282 y = np.asarray(y)
--> 283 return self.classes_[y]
284
285
IndexError: arrays used as indices must be of integer (or boolean) type
错误来自于您的数组具有 "object" 类型 。即使您提取第一列,类型仍然是 "object"(检查 X[:,0].dtype
)。此外 inverse_transform
需要 int 类型。因此,为了使用 inverse_transform
,您需要像这样将 vector 转换为 int:
original = labelencoder.inverse_transform(X[:,0].astype(int))
输出:
array(['a', 'b', 'c'], dtype=object)