TypeError: argument must be a string or number on column with strings that are numbers
TypeError: argument must be a string or number on column with strings that are numbers
我有一个包含类别的数据集。在第 4 列中,我有 2 个值(两个和四个是字符串)。你知道我为什么会收到这个错误以及如何解决它吗?TypeError: argument must be a string or number
Traceback (most recent call last):
File "C:..".py", line 112, in _encode
res = _encode_python(values, uniques, encode)
File "C:...py", line 60, in _encode_python
uniques = sorted(set(values))
TypeError: '<' not supported between instances of 'str' and 'float'
在处理上述异常的过程中,又发生了一个异常:
Traceback (most recent call last):
File "C...".py", line 35, in <module>
X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])
File "C:...py", line 252, in fit_transform
self.classes_, y = _encode(y, encode=True)
File "C:....py", line 114, in _encode
raise TypeError("argument must be a string or number")
TypeError: argument must be a string or number
代码:
import numpy as np #mathematical tools
import matplotlib.pyplot as plt #plot nice charts
import pandas as pd #import and manage data sets
# Making a list of missing value types
missing_values = ["?"]
df= pd.read_csv('D:\data.csv',na_values = missing_values)
#print the new table with the missing values
# print (df)
# print (df.isnull())
X = df.iloc[:, :-1].values #Matrix - independent variables (features)
y = df.iloc[:, 24].values #dependent variables vectors
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X2 = LabelEncoder()
X[:, 2] = labelencoder_X2.fit_transform(X[:, 2]) #gas=0, fuel=1
labelencoder_X3 = LabelEncoder()
X[:, 3] = labelencoder_X3.fit_transform(X[:, 3])
#I get an error her
labelencoder_X4 = LabelEncoder()
X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])
labelencoder_X5 = LabelEncoder()
X[:, 5] = labelencoder_X5.fit_transform(X[:,5])
labelencoder_X6 = LabelEncoder()
X[:, 6] = labelencoder_X6.fit_transform(X[:, 6])
labelencoder_X7 = LabelEncoder()
X[:, 7] = labelencoder_X7.fit_transform(X[:, 7])
labelencoder_X13 = LabelEncoder()
X[:, 13] = labelencoder_X13.fit_transform(X[:, 13])
labelencoder_X14 = LabelEncoder()
X[:, 14] = labelencoder_X14.fit_transform(X[:, 14])
labelencoder_X15 = LabelEncoder()
X[:, 16] = labelencoder_X14.fit_transform(X[:, 16])
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values="NaN", strategy='mean')
imputer.fit(X[:, 1:24])
X[:, 1:24]=imputer.transform(X[:, 1:24])
感谢您的帮助!
当在包含字符串的列中具有 NaN
值时,通常会发生此错误。 NaN
是 float
类型,这就是为什么你得到:
TypeError: '<' not supported between instances of 'str' and 'float'
您应该首先替换缺失值。一种方法:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Making a list of missing value types
missing_values = ["?"]
df = pd.read_csv('D:\data.csv', na_values=missing_values)
X = df.iloc[:, :-1]
y = df.iloc[:, 24]
X.iloc[:, 4] = X.iloc[:, 4].fillna('NaN') # <-- add this line
X.iloc[:, 4] = LabelEncoder().fit_transform(X.iloc[:, 4])
现在标签编码应该不会再造成任何问题了。您必须用字符串替换所有列。
我有一个包含类别的数据集。在第 4 列中,我有 2 个值(两个和四个是字符串)。你知道我为什么会收到这个错误以及如何解决它吗?TypeError: argument must be a string or number
Traceback (most recent call last):
File "C:..".py", line 112, in _encode
res = _encode_python(values, uniques, encode)
File "C:...py", line 60, in _encode_python
uniques = sorted(set(values))
TypeError: '<' not supported between instances of 'str' and 'float'
在处理上述异常的过程中,又发生了一个异常:
Traceback (most recent call last):
File "C...".py", line 35, in <module>
X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])
File "C:...py", line 252, in fit_transform
self.classes_, y = _encode(y, encode=True)
File "C:....py", line 114, in _encode
raise TypeError("argument must be a string or number")
TypeError: argument must be a string or number
代码:
import numpy as np #mathematical tools
import matplotlib.pyplot as plt #plot nice charts
import pandas as pd #import and manage data sets
# Making a list of missing value types
missing_values = ["?"]
df= pd.read_csv('D:\data.csv',na_values = missing_values)
#print the new table with the missing values
# print (df)
# print (df.isnull())
X = df.iloc[:, :-1].values #Matrix - independent variables (features)
y = df.iloc[:, 24].values #dependent variables vectors
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X2 = LabelEncoder()
X[:, 2] = labelencoder_X2.fit_transform(X[:, 2]) #gas=0, fuel=1
labelencoder_X3 = LabelEncoder()
X[:, 3] = labelencoder_X3.fit_transform(X[:, 3])
#I get an error her
labelencoder_X4 = LabelEncoder()
X[:, 4] = labelencoder_X4.fit_transform(X[:, 4])
labelencoder_X5 = LabelEncoder()
X[:, 5] = labelencoder_X5.fit_transform(X[:,5])
labelencoder_X6 = LabelEncoder()
X[:, 6] = labelencoder_X6.fit_transform(X[:, 6])
labelencoder_X7 = LabelEncoder()
X[:, 7] = labelencoder_X7.fit_transform(X[:, 7])
labelencoder_X13 = LabelEncoder()
X[:, 13] = labelencoder_X13.fit_transform(X[:, 13])
labelencoder_X14 = LabelEncoder()
X[:, 14] = labelencoder_X14.fit_transform(X[:, 14])
labelencoder_X15 = LabelEncoder()
X[:, 16] = labelencoder_X14.fit_transform(X[:, 16])
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values="NaN", strategy='mean')
imputer.fit(X[:, 1:24])
X[:, 1:24]=imputer.transform(X[:, 1:24])
感谢您的帮助!
当在包含字符串的列中具有 NaN
值时,通常会发生此错误。 NaN
是 float
类型,这就是为什么你得到:
TypeError: '<' not supported between instances of 'str' and 'float'
您应该首先替换缺失值。一种方法:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Making a list of missing value types
missing_values = ["?"]
df = pd.read_csv('D:\data.csv', na_values=missing_values)
X = df.iloc[:, :-1]
y = df.iloc[:, 24]
X.iloc[:, 4] = X.iloc[:, 4].fillna('NaN') # <-- add this line
X.iloc[:, 4] = LabelEncoder().fit_transform(X.iloc[:, 4])
现在标签编码应该不会再造成任何问题了。您必须用字符串替换所有列。