How to fix "ValueError: Found input variables with inconsistent numbers of samples: [10000, 60000]"?
How to fix "ValueError: Found input variables with inconsistent numbers of samples: [10000, 60000]"?
我在使用随机梯度下降和 MNIST 数据库训练我的代码时遇到问题。
from sklearn.datasets import fetch_mldata
from sklearn.linear_model import SGDClassifier
mnist = fetch_mldata('MNIST original')
X, y = mnist["data"], mnist["target"]
some_digit = X[36000]
some_digit_image = some_digit.reshape(28, 28)
X_train, X_train, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)
进程结束时出错(我认为最后一段代码是错误的):
ValueError: Found input variables with inconsistent numbers of samples: [10000, 60000]
你这边打错了,你给X_train
分配了两次:
X_train, X_train, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
正确答案是:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
顺便说一句。 fetch_mldata
即将被弃用,最好使用:
from sklearn.datasets import fetch_openml
X, y = fetch_openml("mnist_784", version=1, return_X_y=True)
我建议在训练数据集和测试数据集之间使用分层拆分,因为某些 类 可能会在训练中扭曲表示。
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
我在使用随机梯度下降和 MNIST 数据库训练我的代码时遇到问题。
from sklearn.datasets import fetch_mldata
from sklearn.linear_model import SGDClassifier
mnist = fetch_mldata('MNIST original')
X, y = mnist["data"], mnist["target"]
some_digit = X[36000]
some_digit_image = some_digit.reshape(28, 28)
X_train, X_train, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)
进程结束时出错(我认为最后一段代码是错误的):
ValueError: Found input variables with inconsistent numbers of samples: [10000, 60000]
你这边打错了,你给X_train
分配了两次:
X_train, X_train, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
正确答案是:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
顺便说一句。 fetch_mldata
即将被弃用,最好使用:
from sklearn.datasets import fetch_openml
X, y = fetch_openml("mnist_784", version=1, return_X_y=True)
我建议在训练数据集和测试数据集之间使用分层拆分,因为某些 类 可能会在训练中扭曲表示。
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)