python error : too many indices for array
python error : too many indices for array
我的输入是一个导入到 postgresqldb 的 csv 文件。后来我使用 keras.My 构建了一个 cnn,下面的代码给出了以下错误 "IndexError: too many indices for array"。我对机器学习很陌生,所以我不知道如何解决这个问题。有什么建议吗?
X = dataframe1[['Feature1','Feature2','Feature3','Feature4','Feature5','Feature6','Feature7','Feature8','Feature9','Feature10','Feature11','Feature12','Feature13','Feature14']]
Y=result[['label']]
# evaluate model with standardized dataset
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
错误
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-50-0e5d0345015f> in <module>()
2 estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0)
3 kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
----> 4 results = cross_val_score(estimator, X, Y, cv=kfold)
5 print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
129
130 cv = check_cv(cv, y, classifier=is_classifier(estimator))
--> 131 cv_iter = list(cv.split(X, y, groups))
132 scorer = check_scoring(estimator, scoring=scoring)
133 # We clone the estimator to make sure that all the folds are
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
320 n_samples))
321
--> 322 for train, test in super(_BaseKFold, self).split(X, y, groups):
323 yield train, test
324
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
89 X, y, groups = indexable(X, y, groups)
90 indices = np.arange(_num_samples(X))
---> 91 for test_index in self._iter_test_masks(X, y, groups):
92 train_index = indices[np.logical_not(test_index)]
93 test_index = indices[test_index]
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in _iter_test_masks(self, X, y, groups)
608
609 def _iter_test_masks(self, X, y=None, groups=None):
--> 610 test_folds = self._make_test_folds(X, y)
611 for i in range(self.n_splits):
612 yield test_folds == i
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in _make_test_folds(self, X, y, groups)
595 for test_fold_indices, per_cls_splits in enumerate(zip(*per_cls_cvs)):
596 for cls, (_, test_split) in zip(unique_y, per_cls_splits):
--> 597 cls_test_folds = test_folds[y == cls]
598 # the test split can be too big because we used
599 # KFold(...).split(X[:max(c, n_splits)]) when data is not 100%
IndexError: too many indices for array
我应该用不同的方式声明数组或数据框吗?
请注意 User Guide 中的示例显示 X
是二维的,而 y
是一维的:
>>> X_train.shape, y_train.shape
((90, 4), (90,))
有些程序员对二维数组使用大写变量,对一维数组使用小写变量。
因此使用
Y = result['label']
而不是
Y = result[['label']]
我假设 result
是一个 pandas DataFrame。当您使用 ['label']
等列列表索引 Dataframe 时,将返回一个二维的子 DataFrame。如果您使用单个字符串索引 DataFrame,则会返回一维系列。
最后,注意 IndexError
IndexError: too many indices for array
在这一行提出
cls_test_folds = test_folds[y == cls]
因为 y
是二维的,所以 y == cls
是二维的 boolean 数组,而 test_folds
是一维的。情况类似如下:
In [72]: test_folds = np.zeros(5, dtype=np.int)
In [73]: y_eq_cls = np.array([(True, ), (False,)])
In [74]: test_folds[y_eq_cls]
IndexError: too many indices for array
我的输入是一个导入到 postgresqldb 的 csv 文件。后来我使用 keras.My 构建了一个 cnn,下面的代码给出了以下错误 "IndexError: too many indices for array"。我对机器学习很陌生,所以我不知道如何解决这个问题。有什么建议吗?
X = dataframe1[['Feature1','Feature2','Feature3','Feature4','Feature5','Feature6','Feature7','Feature8','Feature9','Feature10','Feature11','Feature12','Feature13','Feature14']]
Y=result[['label']]
# evaluate model with standardized dataset
results = cross_val_score(estimator, X, Y, cv=kfold)
print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
错误
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-50-0e5d0345015f> in <module>()
2 estimator = KerasClassifier(build_fn=create_baseline, nb_epoch=100, batch_size=5, verbose=0)
3 kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
----> 4 results = cross_val_score(estimator, X, Y, cv=kfold)
5 print("Results: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
129
130 cv = check_cv(cv, y, classifier=is_classifier(estimator))
--> 131 cv_iter = list(cv.split(X, y, groups))
132 scorer = check_scoring(estimator, scoring=scoring)
133 # We clone the estimator to make sure that all the folds are
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
320 n_samples))
321
--> 322 for train, test in super(_BaseKFold, self).split(X, y, groups):
323 yield train, test
324
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
89 X, y, groups = indexable(X, y, groups)
90 indices = np.arange(_num_samples(X))
---> 91 for test_index in self._iter_test_masks(X, y, groups):
92 train_index = indices[np.logical_not(test_index)]
93 test_index = indices[test_index]
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in _iter_test_masks(self, X, y, groups)
608
609 def _iter_test_masks(self, X, y=None, groups=None):
--> 610 test_folds = self._make_test_folds(X, y)
611 for i in range(self.n_splits):
612 yield test_folds == i
C:\Anacondav3\lib\site-packages\sklearn\model_selection\_split.py in _make_test_folds(self, X, y, groups)
595 for test_fold_indices, per_cls_splits in enumerate(zip(*per_cls_cvs)):
596 for cls, (_, test_split) in zip(unique_y, per_cls_splits):
--> 597 cls_test_folds = test_folds[y == cls]
598 # the test split can be too big because we used
599 # KFold(...).split(X[:max(c, n_splits)]) when data is not 100%
IndexError: too many indices for array
我应该用不同的方式声明数组或数据框吗?
请注意 User Guide 中的示例显示 X
是二维的,而 y
是一维的:
>>> X_train.shape, y_train.shape
((90, 4), (90,))
有些程序员对二维数组使用大写变量,对一维数组使用小写变量。
因此使用
Y = result['label']
而不是
Y = result[['label']]
我假设 result
是一个 pandas DataFrame。当您使用 ['label']
等列列表索引 Dataframe 时,将返回一个二维的子 DataFrame。如果您使用单个字符串索引 DataFrame,则会返回一维系列。
最后,注意 IndexError
IndexError: too many indices for array
在这一行提出
cls_test_folds = test_folds[y == cls]
因为 y
是二维的,所以 y == cls
是二维的 boolean 数组,而 test_folds
是一维的。情况类似如下:
In [72]: test_folds = np.zeros(5, dtype=np.int)
In [73]: y_eq_cls = np.array([(True, ), (False,)])
In [74]: test_folds[y_eq_cls]
IndexError: too many indices for array