Pandas 和 scikit-learn - X、y 的 train_test_split 维度
Pandas and scikit-learn - train_test_split dimensions of X, y
我有一个 pandas datafrane,其中包含以下信息:
RangeIndex: 920 entries, 0 to 919 Data columns (total 41 columns)
X = df[df.columns[:-1]]
Y = df['my_Target']
train_X,train_y,test_X, test_y =train_test_split(X,Y,test_size=0.33,shuffle = True, random_state=45)
最后一列是目标,其余是数据。
形状如下:
print(train_X.shape,train_y.shape,test_X.shape, test_y.shape)
(616, 40) (304, 40) (616,) (304,)
但是当我训练模型时:
model=svm.SVC(kernel='linear',C=0.1,gamma=0.1)
model.fit(train_X,train_Y)
prediction2=model.predict(test_X)
print('Accuracy for linear SVM is',metrics.accuracy_score(prediction2,test_Y))
它给出了以下错误:
model.fit(train_X,train_Y)
ValueError: Found input variables with inconsistent numbers of
samples: [616, 2]
有人知道发生了什么事吗?
您的变量顺序错误:
X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.33, random_state=42)
Per docs
X_train 然后 X_test 然后 y_train 然后 y_test
你有:
train_X,train_y,test_X,test_y
我有一个 pandas datafrane,其中包含以下信息:
RangeIndex: 920 entries, 0 to 919 Data columns (total 41 columns)
X = df[df.columns[:-1]]
Y = df['my_Target']
train_X,train_y,test_X, test_y =train_test_split(X,Y,test_size=0.33,shuffle = True, random_state=45)
最后一列是目标,其余是数据。 形状如下:
print(train_X.shape,train_y.shape,test_X.shape, test_y.shape)
(616, 40) (304, 40) (616,) (304,)
但是当我训练模型时:
model=svm.SVC(kernel='linear',C=0.1,gamma=0.1)
model.fit(train_X,train_Y)
prediction2=model.predict(test_X)
print('Accuracy for linear SVM is',metrics.accuracy_score(prediction2,test_Y))
它给出了以下错误:
model.fit(train_X,train_Y)
ValueError: Found input variables with inconsistent numbers of samples: [616, 2]
有人知道发生了什么事吗?
您的变量顺序错误:
X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.33, random_state=42)
Per docs
X_train 然后 X_test 然后 y_train 然后 y_test
你有:
train_X,train_y,test_X,test_y