TypeError: argument must be a string or number on column with strings that are numbers

Question

我有一个包含类别的数据集。在第 4 列中，我有 2 个值（两个和四个是字符串）。你知道我为什么会收到这个错误以及如何解决它吗？TypeError: argument must be a string or number

Traceback (most recent call last):

  File "C:..".py", line 112, in _encode
    res = _encode_python(values, uniques, encode)

  File "C:...py", line 60, in _encode_python
    uniques = sorted(set(values))

TypeError: '<' not supported between instances of 'str' and 'float'

在处理上述异常的过程中，又发生了一个异常：

Traceback (most recent call last):

  File "C...".py", line 35, in <module>
    X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])

  File "C:...py", line 252, in fit_transform
    self.classes_, y = _encode(y, encode=True)

  File "C:....py", line 114, in _encode
    raise TypeError("argument must be a string or number")

TypeError: argument must be a string or number

代码：

import numpy as np #mathematical tools
import matplotlib.pyplot as plt #plot nice charts
import pandas as pd #import and manage data sets

# Making a list of missing value types
missing_values = ["?"]
df= pd.read_csv('D:\data.csv',na_values = missing_values)

#print the new table with the missing values 
# print (df)
# print (df.isnull())


X = df.iloc[:, :-1].values #Matrix - independent variables (features)
y = df.iloc[:, 24].values #dependent variables vectors


from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X2 = LabelEncoder()
X[:, 2] = labelencoder_X2.fit_transform(X[:, 2]) #gas=0, fuel=1 

labelencoder_X3 = LabelEncoder()
X[:, 3] = labelencoder_X3.fit_transform(X[:, 3])

#I get an error her
labelencoder_X4 = LabelEncoder()
X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])

labelencoder_X5 = LabelEncoder()
X[:, 5] = labelencoder_X5.fit_transform(X[:,5])

labelencoder_X6 = LabelEncoder()
X[:, 6] = labelencoder_X6.fit_transform(X[:, 6])

labelencoder_X7 = LabelEncoder()
X[:, 7] = labelencoder_X7.fit_transform(X[:, 7])

labelencoder_X13 = LabelEncoder()
X[:, 13] = labelencoder_X13.fit_transform(X[:, 13])

labelencoder_X14 = LabelEncoder()
X[:, 14] = labelencoder_X14.fit_transform(X[:, 14])

labelencoder_X15 = LabelEncoder()
X[:, 16] = labelencoder_X14.fit_transform(X[:, 16])

from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values="NaN", strategy='mean')
imputer.fit(X[:, 1:24])  
X[:, 1:24]=imputer.transform(X[:, 1:24])

感谢您的帮助！

Answer 1

当在包含字符串的列中具有 NaN 值时，通常会发生此错误。 NaN 是 float 类型，这就是为什么你得到：

TypeError: '<' not supported between instances of 'str' and 'float'

您应该首先替换缺失值。一种方法：

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Making a list of missing value types
missing_values = ["?"]
df = pd.read_csv('D:\data.csv', na_values=missing_values)


X = df.iloc[:, :-1]
y = df.iloc[:, 24]

X.iloc[:, 4] = X.iloc[:, 4].fillna('NaN') # <-- add this line

X.iloc[:, 4] = LabelEncoder().fit_transform(X.iloc[:, 4])

现在标签编码应该不会再造成任何问题了。您必须用字符串替换所有列。

TypeError: argument must be a string or number on column with strings that are numbers

TypeError: argument must be a string or number on column with strings that are numbers

python

machine-learning

python-3.x

scikit-learn

one-hot-encoding