Pandas 和 scikit-learn - X、y 的 train_test_split 维度

Question

我有一个 pandas datafrane，其中包含以下信息：

RangeIndex: 920 entries, 0 to 919 Data columns (total 41 columns)

X = df[df.columns[:-1]]
Y = df['my_Target']   
train_X,train_y,test_X, test_y =train_test_split(X,Y,test_size=0.33,shuffle = True, random_state=45)

最后一列是目标，其余是数据。形状如下：

print(train_X.shape,train_y.shape,test_X.shape, test_y.shape)

(616, 40) (304, 40) (616,) (304,)

但是当我训练模型时：

model=svm.SVC(kernel='linear',C=0.1,gamma=0.1)
model.fit(train_X,train_Y)
prediction2=model.predict(test_X)
print('Accuracy for linear SVM is',metrics.accuracy_score(prediction2,test_Y))

它给出了以下错误：

model.fit(train_X,train_Y)

ValueError: Found input variables with inconsistent numbers of samples: [616, 2]

有人知道发生了什么事吗？

Answer 1

您的变量顺序错误：

X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)

Per docs
X_train 然后 X_test 然后 y_train 然后 y_test

你有：

train_X,train_y,test_X,test_y

Pandas 和 scikit-learn - X、y 的 train_test_split 维度

Pandas and scikit-learn - train_test_split dimensions of X, y

python

svm

pandas

scikit-learn