Getting "ValueError: Expected 2D array, got 1D array instead" error in python-sklearn
Getting "ValueError: Expected 2D array, got 1D array instead" error in python-sklearn
请帮助我。我无法解决我遇到的一个错误。我是 python 的机器学习新手。如有任何建议,我们将不胜感激。
下面是我编写的代码,用于根据公司员工的性别、学历和执照预测他们可能喜欢的交通工具类型:
Gender = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Gender'])
Engineer = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Engineer'])
MBA = preprocessing.LabelEncoder().fit_transform(df.loc[:,'MBA'])
License = preprocessing.LabelEncoder().fit_transform(df.loc[:,'license'])
Transport = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Transport'])
x,y = Gender.reshape(-1,1), Transport
print("\n\nGender:", Gender, "\n\nEngineer:", Engineer, "\n\nMBA:", MBA, "\n\nLicense:", license, "\n\nTransport:", Transport)
model = GaussianNB().fit(x,y)
a1 = input("\n\n Choose Gender : Male:1 or Female:0 = ")
b1 = input("\n\n Are you an Engineer? : Yes:1 or No:0 = ")
c1 = input("\n\n Have you done MBA? : Yes:1 or No:0 = ")
d1 = input("\n\n Do you have license? : Yes:1 or No:0 = ")
#store the output in y_pred
y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])
#for loop to predict customizable output
if y_pred == [1]:
print("\n\n You prefer Public Transport")
else:
print("\n\n You prefer Private Transport")
这是我在最后阶段遇到的错误:
ValueError Traceback (most recent call last)
<ipython-input-104-a14f86182731> in <module>
6 #store the output in y_pred
7
----> 8 y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])
9
10 #for loop to predict customizable output
~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in predict(self, X)
63 Predicted target values for X
64 """
---> 65 jll = self._joint_log_likelihood(X)
66 return self.classes_[np.argmax(jll, axis=1)]
67
~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in _joint_log_likelihood(self, X)
428 check_is_fitted(self, "classes_")
429
--> 430 X = check_array(X)
431 joint_log_likelihood = []
432 for i in range(np.size(self.classes_)):
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
519 "Reshape your data either using array.reshape(-1, 1) if "
520 "your data has a single feature or array.reshape(1, -1) "
--> 521 "if it contains a single sample.".format(array))
522
523 # in the future np.flexible dtypes will be handled like object dtypes
ValueError: Expected 2D array, got 1D array instead:
array=[1 1 0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
下面是我的数据集的结构:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 444 entries, 28 to 39
Data columns (total 8 columns):
Gender 444 non-null object
Engineer 444 non-null int64
MBA 444 non-null int64
Work Exp 444 non-null int64
Salary 444 non-null float64
Distance 444 non-null float64
license 444 non-null int64
Transport 444 non-null object
dtypes: float64(2), int64(4), object(2)
memory usage: 31.2+ KB
错误消息非常冗长,它告诉您您提供了一个一维数组,而应该是一个二维数组:
Expected 2D array, got 1D array instead
堆栈跟踪指向这一行:
y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])
它还告诉你如何解决这个问题:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
由于您正在尝试预测单个样本,因此您应该使用后者:
import numpy as np
y_pred = model.predict(np.array([int(a1),int(b1),int(c1),int(d1)]).reshape(1, -1))
请注意,我删除了没有用的双重赋值 y_pred = model = ...
。
补充说明
与此特定错误无关,但可能不是您想要的:您仅根据性别特征拟合模型。看到这些行:
x,y = Gender.reshape(-1,1), Transport
...
model = GaussianNB().fit(x,y)
这会破坏您的代码,因为您正在将模型拟合到单个特征上,然后想要预测具有四个特征的样本。你也应该解决这个问题。解决方案可能如下所示:
X = OrdinalEncoder().fit_transform(df.loc[:,['Gender', 'Engineer', 'MBA', 'license']])
y = LabelEncoder().fit_transform(df.loc[:,'Transport'])
model = GaussianNB()
model.fit(X, y)
看到我使用 OrdinalEncoder
作为特征,因为 LabelEncoder
仅用于编码目标 y
(与 documentation 比较)。
请帮助我。我无法解决我遇到的一个错误。我是 python 的机器学习新手。如有任何建议,我们将不胜感激。
下面是我编写的代码,用于根据公司员工的性别、学历和执照预测他们可能喜欢的交通工具类型:
Gender = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Gender'])
Engineer = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Engineer'])
MBA = preprocessing.LabelEncoder().fit_transform(df.loc[:,'MBA'])
License = preprocessing.LabelEncoder().fit_transform(df.loc[:,'license'])
Transport = preprocessing.LabelEncoder().fit_transform(df.loc[:,'Transport'])
x,y = Gender.reshape(-1,1), Transport
print("\n\nGender:", Gender, "\n\nEngineer:", Engineer, "\n\nMBA:", MBA, "\n\nLicense:", license, "\n\nTransport:", Transport)
model = GaussianNB().fit(x,y)
a1 = input("\n\n Choose Gender : Male:1 or Female:0 = ")
b1 = input("\n\n Are you an Engineer? : Yes:1 or No:0 = ")
c1 = input("\n\n Have you done MBA? : Yes:1 or No:0 = ")
d1 = input("\n\n Do you have license? : Yes:1 or No:0 = ")
#store the output in y_pred
y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])
#for loop to predict customizable output
if y_pred == [1]:
print("\n\n You prefer Public Transport")
else:
print("\n\n You prefer Private Transport")
这是我在最后阶段遇到的错误:
ValueError Traceback (most recent call last)
<ipython-input-104-a14f86182731> in <module>
6 #store the output in y_pred
7
----> 8 y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])
9
10 #for loop to predict customizable output
~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in predict(self, X)
63 Predicted target values for X
64 """
---> 65 jll = self._joint_log_likelihood(X)
66 return self.classes_[np.argmax(jll, axis=1)]
67
~\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in _joint_log_likelihood(self, X)
428 check_is_fitted(self, "classes_")
429
--> 430 X = check_array(X)
431 joint_log_likelihood = []
432 for i in range(np.size(self.classes_)):
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
519 "Reshape your data either using array.reshape(-1, 1) if "
520 "your data has a single feature or array.reshape(1, -1) "
--> 521 "if it contains a single sample.".format(array))
522
523 # in the future np.flexible dtypes will be handled like object dtypes
ValueError: Expected 2D array, got 1D array instead:
array=[1 1 0 1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
下面是我的数据集的结构:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 444 entries, 28 to 39
Data columns (total 8 columns):
Gender 444 non-null object
Engineer 444 non-null int64
MBA 444 non-null int64
Work Exp 444 non-null int64
Salary 444 non-null float64
Distance 444 non-null float64
license 444 non-null int64
Transport 444 non-null object
dtypes: float64(2), int64(4), object(2)
memory usage: 31.2+ KB
错误消息非常冗长,它告诉您您提供了一个一维数组,而应该是一个二维数组:
Expected 2D array, got 1D array instead
堆栈跟踪指向这一行:
y_pred = model = model.predict([int(a1),int(b1),int(c1),int(d1)])
它还告诉你如何解决这个问题:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
由于您正在尝试预测单个样本,因此您应该使用后者:
import numpy as np
y_pred = model.predict(np.array([int(a1),int(b1),int(c1),int(d1)]).reshape(1, -1))
请注意,我删除了没有用的双重赋值 y_pred = model = ...
。
补充说明
与此特定错误无关,但可能不是您想要的:您仅根据性别特征拟合模型。看到这些行:
x,y = Gender.reshape(-1,1), Transport
...
model = GaussianNB().fit(x,y)
这会破坏您的代码,因为您正在将模型拟合到单个特征上,然后想要预测具有四个特征的样本。你也应该解决这个问题。解决方案可能如下所示:
X = OrdinalEncoder().fit_transform(df.loc[:,['Gender', 'Engineer', 'MBA', 'license']])
y = LabelEncoder().fit_transform(df.loc[:,'Transport'])
model = GaussianNB()
model.fit(X, y)
看到我使用 OrdinalEncoder
作为特征,因为 LabelEncoder
仅用于编码目标 y
(与 documentation 比较)。