无法获得训练和测试集
Can't get train and test sets
我应用了 k 折交叉验证将数据拆分为训练集和测试集。
但是当我想获得训练集和测试集时,我遇到了这些错误:
AttributeError: 'numpy.ndarray' 对象没有属性 'iloc'
感谢您的帮助。
y = df_dummies['Churn'].values
X = df_dummies.drop(columns = ['Churn'])
from sklearn.preprocessing import MinMaxScaler
features = X.columns.values
scaler = MinMaxScaler(feature_range = (0,1))
scaler.fit(X)
X = pd.DataFrame(scaler.transform(X))
X.columns = features
from sklearn.model_selection import KFold
kf=KFold(n_splits=5,shuffle=True)
for train,test in kf.split(X):
print("%s %s" % (train,test))
for train_index, test_index in kf.split(X):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
from sklearn.linear_model import LogisticRegression
CLF = LogisticRegression().fit(X_train, y_train)
print('Accuracy of Logistic regression classifier on training set: {:.2f}'
.format(CLF.score(X_train, y_train)))
print('Accuracy of Logistic regression classifier on test set: {:.2f}'
.format(CLF.score(X_test, y_test)))
NameError: name 'y_train' is not defined
问题是 df_dummies['Churn'].values
returns 数组不是数据框。但是您正试图从不存在的数组中获取属性。 iloc
函数在 pandas.DataFrame
中。
改用y = df_dummies['Churn']
。
PS:我不知道如何将这类问题迁移到姊妹站点。也许,知道的人可以将其迁移到交叉验证。
我应用了 k 折交叉验证将数据拆分为训练集和测试集。 但是当我想获得训练集和测试集时,我遇到了这些错误:
AttributeError: 'numpy.ndarray' 对象没有属性 'iloc'
感谢您的帮助。
y = df_dummies['Churn'].values
X = df_dummies.drop(columns = ['Churn'])
from sklearn.preprocessing import MinMaxScaler
features = X.columns.values
scaler = MinMaxScaler(feature_range = (0,1))
scaler.fit(X)
X = pd.DataFrame(scaler.transform(X))
X.columns = features
from sklearn.model_selection import KFold
kf=KFold(n_splits=5,shuffle=True)
for train,test in kf.split(X):
print("%s %s" % (train,test))
for train_index, test_index in kf.split(X):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
from sklearn.linear_model import LogisticRegression
CLF = LogisticRegression().fit(X_train, y_train)
print('Accuracy of Logistic regression classifier on training set: {:.2f}'
.format(CLF.score(X_train, y_train)))
print('Accuracy of Logistic regression classifier on test set: {:.2f}'
.format(CLF.score(X_test, y_test)))
NameError: name 'y_train' is not defined
问题是 df_dummies['Churn'].values
returns 数组不是数据框。但是您正试图从不存在的数组中获取属性。 iloc
函数在 pandas.DataFrame
中。
改用y = df_dummies['Churn']
。
PS:我不知道如何将这类问题迁移到姊妹站点。也许,知道的人可以将其迁移到交叉验证。