实施 train_test_split 时出现 ValueError
ValueError while implementing the train_test_split
我正在学习 Kaggle 上的机器学习教程,尽管逐行学习了教程,但我还是有一个 ValueError
。我正在尝试通过拆分来练习数据验证。这是我的代码:
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
main_file_path = '../input/train.csv'
data = pd.read_csv(main_file_path)
y = data.SalePrice
data_predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
x = data[data_predictors]
train_x, val_x, train_y, val_x = train_test_split(x, y,random_state = 0)
data_model = DecisionTreeRegressor()
data_model.fit(train_x,train_y)
data_prediction = data_model.predict(val_x)
print(mean_absolute_error(val_y, data_prediction))
错误指向这一行:
data_prediction = data_model.predict(val_x)
我是 ML 学习的初学者,所以我将我的代码与作者的代码进行了比较,实现是相同的。
完整堆栈跟踪:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-48f37072f996> in <module>()
17 data_model.fit(train_x,train_y)
18
---> 19 data_prediction = data_model.predict(val_x)
20 print(mean_absolute_error(val_y, data_prediction))
/opt/conda/lib/python3.6/site-packages/sklearn/tree/tree.py in predict(self, X, check_input)
410 """
411 check_is_fitted(self, 'tree_')
--> 412 X = self._validate_X_predict(X, check_input)
413 proba = self.tree_.predict(X)
414 n_samples = X.shape[0]
/opt/conda/lib/python3.6/site-packages/sklearn/tree/tree.py in _validate_X_predict(self, X, check_input)
371 """Validate X whenever one tries to predict, apply, predict_proba"""
372 if check_input:
--> 373 X = check_array(X, dtype=DTYPE, accept_sparse="csr")
374 if issparse(X) and (X.indices.dtype != np.intc or
375 X.indptr.dtype != np.intc):
/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
439 "Reshape your data either using array.reshape(-1, 1) if "
440 "your data has a single feature or array.reshape(1, -1) "
--> 441 "if it contains a single sample.".format(array))
442 array = np.atleast_2d(array)
443 # To ensure that array flags are maintained
ValueError: Expected 2D array, got 1D array instead:
虽然错误来自您指出的行,但实际问题出在这一行:
train_x, val_x, train_y, val_x = train_test_split(x, y,random_state = 0)
请注意,您有两个 val_x
。第二个 val_x
应该是 val_y
。发生的事情是,你设置 val_x
,它应该是一个二维输入数组,应该是 y
值,这些值是一维预测数组 - 从而得到 ValueError 说你输入一个一维数组,其中需要一个二维数组。
我正在学习 Kaggle 上的机器学习教程,尽管逐行学习了教程,但我还是有一个 ValueError
。我正在尝试通过拆分来练习数据验证。这是我的代码:
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
main_file_path = '../input/train.csv'
data = pd.read_csv(main_file_path)
y = data.SalePrice
data_predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
x = data[data_predictors]
train_x, val_x, train_y, val_x = train_test_split(x, y,random_state = 0)
data_model = DecisionTreeRegressor()
data_model.fit(train_x,train_y)
data_prediction = data_model.predict(val_x)
print(mean_absolute_error(val_y, data_prediction))
错误指向这一行:
data_prediction = data_model.predict(val_x)
我是 ML 学习的初学者,所以我将我的代码与作者的代码进行了比较,实现是相同的。
完整堆栈跟踪:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-48f37072f996> in <module>()
17 data_model.fit(train_x,train_y)
18
---> 19 data_prediction = data_model.predict(val_x)
20 print(mean_absolute_error(val_y, data_prediction))
/opt/conda/lib/python3.6/site-packages/sklearn/tree/tree.py in predict(self, X, check_input)
410 """
411 check_is_fitted(self, 'tree_')
--> 412 X = self._validate_X_predict(X, check_input)
413 proba = self.tree_.predict(X)
414 n_samples = X.shape[0]
/opt/conda/lib/python3.6/site-packages/sklearn/tree/tree.py in _validate_X_predict(self, X, check_input)
371 """Validate X whenever one tries to predict, apply, predict_proba"""
372 if check_input:
--> 373 X = check_array(X, dtype=DTYPE, accept_sparse="csr")
374 if issparse(X) and (X.indices.dtype != np.intc or
375 X.indptr.dtype != np.intc):
/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
439 "Reshape your data either using array.reshape(-1, 1) if "
440 "your data has a single feature or array.reshape(1, -1) "
--> 441 "if it contains a single sample.".format(array))
442 array = np.atleast_2d(array)
443 # To ensure that array flags are maintained
ValueError: Expected 2D array, got 1D array instead:
虽然错误来自您指出的行,但实际问题出在这一行:
train_x, val_x, train_y, val_x = train_test_split(x, y,random_state = 0)
请注意,您有两个 val_x
。第二个 val_x
应该是 val_y
。发生的事情是,你设置 val_x
,它应该是一个二维输入数组,应该是 y
值,这些值是一维预测数组 - 从而得到 ValueError 说你输入一个一维数组,其中需要一个二维数组。