发现样本数量不一致的输入变量:[164, 41]
Found input variables with inconsistent numbers of samples: [164, 41]
我正在尝试使用随机森林制作一个预测模型,该模型将 CarName 预测为预测变量,特征是 gas、rear、two。
CarName 是分类变量,其余是数字。
在尝试 运行 下面的代码时出现此错误,任何人都可以帮我解决这个问题,在此先感谢,这是我的代码。
snipets...
from sklearn.model_selection import train_test_split
X=df6[['gas','rear','two']] #these are all in int form
y=df6[['CarName']].values.reshape(-1,1) # this is in object form
X_train,X_test,y_test,y_train=train_test_split(X,y,test_size=0.2)
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X_train,y_train)
出现错误。
ValueError Traceback (most recent call last)
<ipython-input-54-4c45187c84b2> in <module>
1 from sklearn.ensemble import RandomForestClassifier
2 clf=RandomForestClassifier(n_estimators=100)
----> 3 clf.fit(X_train,y_train)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
302 "sparse multilabel-indicator for y is not supported."
303 )
--> 304 X, y = self._validate_data(X, y, multi_output=True,
305 accept_sparse="csc", dtype=DTYPE)
306 if sample_weight is not None:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
431 y = check_array(y, **check_y_params)
432 else:
--> 433 X, y = check_X_y(X, y, **check_params)
434 out = X, y
435
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
829 y = y.astype(np.float64)
830
--> 831 check_consistent_length(X, y)
832
833 return X, y
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
260 uniques = np.unique(lengths)
261 if len(uniques) > 1:
--> 262 raise ValueError("Found input variables with inconsistent numbers of"
263 " samples: %r" % [int(l) for l in lengths])
264
ValueError: Found input variables with inconsistent numbers of samples: [164, 41]
我的 df 的形状。
X_train.shape,y_train.shape
Out[53]:
((164, 3), (41, 1)) #I guess this is the code which giving me error but am unable to solve it
你得到的错误是因为:
X_train,X_test,y_test,y_train=train_test_split(X,y,test_size=0.2)
值的映射按照 train_test_split 的 return:
的顺序进行
X_train,X_test,y_train,y_test
即。 y_train 后跟 y_test,因此形状不匹配。改一下就可以了。
我正在尝试使用随机森林制作一个预测模型,该模型将 CarName 预测为预测变量,特征是 gas、rear、two。
CarName 是分类变量,其余是数字。 在尝试 运行 下面的代码时出现此错误,任何人都可以帮我解决这个问题,在此先感谢,这是我的代码。
snipets...
from sklearn.model_selection import train_test_split
X=df6[['gas','rear','two']] #these are all in int form
y=df6[['CarName']].values.reshape(-1,1) # this is in object form
X_train,X_test,y_test,y_train=train_test_split(X,y,test_size=0.2)
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X_train,y_train)
出现错误。
ValueError Traceback (most recent call last)
<ipython-input-54-4c45187c84b2> in <module>
1 from sklearn.ensemble import RandomForestClassifier
2 clf=RandomForestClassifier(n_estimators=100)
----> 3 clf.fit(X_train,y_train)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
302 "sparse multilabel-indicator for y is not supported."
303 )
--> 304 X, y = self._validate_data(X, y, multi_output=True,
305 accept_sparse="csc", dtype=DTYPE)
306 if sample_weight is not None:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
431 y = check_array(y, **check_y_params)
432 else:
--> 433 X, y = check_X_y(X, y, **check_params)
434 out = X, y
435
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
829 y = y.astype(np.float64)
830
--> 831 check_consistent_length(X, y)
832
833 return X, y
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
260 uniques = np.unique(lengths)
261 if len(uniques) > 1:
--> 262 raise ValueError("Found input variables with inconsistent numbers of"
263 " samples: %r" % [int(l) for l in lengths])
264
ValueError: Found input variables with inconsistent numbers of samples: [164, 41]
我的 df 的形状。
X_train.shape,y_train.shape
Out[53]:
((164, 3), (41, 1)) #I guess this is the code which giving me error but am unable to solve it
你得到的错误是因为:
X_train,X_test,y_test,y_train=train_test_split(X,y,test_size=0.2)
值的映射按照 train_test_split 的 return:
的顺序进行X_train,X_test,y_train,y_test
即。 y_train 后跟 y_test,因此形状不匹配。改一下就可以了。