Scikit-Learn:如何处理不可排序的类型错误?
Scikit-Learn: How to deal with an unorderable types error?
我正在研究 Python 3.5,以根据 train.csv 中的数据预测 test.csv 中的一些数据。
在执行数据处理时,我转换了 train.csv 的行和列,效果非常好。但是当用 test.csv 做同样的事情时,它给出了 :
类型错误:不可排序的类型:float() > str()
train = pd.read_csv('train.csv', header = 0, parse_dates = True, low_memory= False)
test = pd.read_csv('test.csv' , header =0, parse_dates = True, low_memory= False)
le = preprocessing.LabelEncoder()
train.Category = le.fit_transform(train.Category)
train.DayOfWeek = le.fit_transform(train.DayOfWeek)
train.PdDistrict = le.fit_transform(train.PdDistrict)
错误部分
test.DayOfWeek = le.fit_transform(test.DayOfWeek)
test.PdDistrict = le.fit_transform(test.PdDistrict)
两个问题。您不应该为多个列重复使用相同的 LabelEncoder
。否则您将丢失映射并且无法转换您的测试数据。
category_le = preprocessing.LabelEncoder()
day_of_week_le = preprocessing.LabelEncoder()
pd_district_le = preprocessing.LabelEncoder()
train_category = category_le.fit_transform(train.Category)
train_day_of_week = day_of_week_le.fit_transform(train.DayOfWeek)
train_pd_district = pd_district_le.fit_transform(train.PdDistrict)
train_X = np.hstack([train_category_mat, train_day_of_week_mat, pd_district_le])
test_category = category_le.transform(test.Category)
test_day_of_week = day_of_week_le.transform(test.DayOfWeek)
test_pd_district = pd_district_le.transform(test.PdDistrict)
这里只是一个快速代码片段,可以帮助任何正在搜索的人解决无法排序的类型错误。
问题(您已经找到)粘贴在此处,如在另一个论坛中发现的那样 post:"because there were essentially mixed types within the column I was trying to encode. I was finally able to get around it by converting each of the 'object' type columns to 'str' type and that stopped the error."
处理缺失数据后,此代码可用于迭代匹配一组数据类型的列并将它们转换为字符串,使用 .astype(str)
属性.
#REPLACE NAN WITH 0
X_train.fillna(0.0, inplace=True)
#GET LIST OF COLUMNS TO ENCODE
cols_to_enc = list(X_train.select_dtypes(include=['category', 'object']))
for feature in cols_to_enc:
try:
#CONVERT VALUE TO STRING (TO AVOID UNORDERED TYPE ERRORS)
X_train[feature] = X_train[feature].astype(str)
except Exception as err:
print('cannot convert: %s' % feature)
print(err)
我正在研究 Python 3.5,以根据 train.csv 中的数据预测 test.csv 中的一些数据。
在执行数据处理时,我转换了 train.csv 的行和列,效果非常好。但是当用 test.csv 做同样的事情时,它给出了 :
类型错误:不可排序的类型:float() > str()
train = pd.read_csv('train.csv', header = 0, parse_dates = True, low_memory= False)
test = pd.read_csv('test.csv' , header =0, parse_dates = True, low_memory= False)
le = preprocessing.LabelEncoder()
train.Category = le.fit_transform(train.Category)
train.DayOfWeek = le.fit_transform(train.DayOfWeek)
train.PdDistrict = le.fit_transform(train.PdDistrict)
错误部分
test.DayOfWeek = le.fit_transform(test.DayOfWeek)
test.PdDistrict = le.fit_transform(test.PdDistrict)
两个问题。您不应该为多个列重复使用相同的 LabelEncoder
。否则您将丢失映射并且无法转换您的测试数据。
category_le = preprocessing.LabelEncoder()
day_of_week_le = preprocessing.LabelEncoder()
pd_district_le = preprocessing.LabelEncoder()
train_category = category_le.fit_transform(train.Category)
train_day_of_week = day_of_week_le.fit_transform(train.DayOfWeek)
train_pd_district = pd_district_le.fit_transform(train.PdDistrict)
train_X = np.hstack([train_category_mat, train_day_of_week_mat, pd_district_le])
test_category = category_le.transform(test.Category)
test_day_of_week = day_of_week_le.transform(test.DayOfWeek)
test_pd_district = pd_district_le.transform(test.PdDistrict)
这里只是一个快速代码片段,可以帮助任何正在搜索的人解决无法排序的类型错误。
问题(您已经找到)粘贴在此处,如在另一个论坛中发现的那样 post:"because there were essentially mixed types within the column I was trying to encode. I was finally able to get around it by converting each of the 'object' type columns to 'str' type and that stopped the error."
处理缺失数据后,此代码可用于迭代匹配一组数据类型的列并将它们转换为字符串,使用 .astype(str)
属性.
#REPLACE NAN WITH 0
X_train.fillna(0.0, inplace=True)
#GET LIST OF COLUMNS TO ENCODE
cols_to_enc = list(X_train.select_dtypes(include=['category', 'object']))
for feature in cols_to_enc:
try:
#CONVERT VALUE TO STRING (TO AVOID UNORDERED TYPE ERRORS)
X_train[feature] = X_train[feature].astype(str)
except Exception as err:
print('cannot convert: %s' % feature)
print(err)