Python sklearn 值错误目标变量
Python sklearn Value Error Target Variable
我 运行 以下代码:
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test=train_test_split(X,y,stratify=y,test_size=0.3)
输出:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-b5740f8ae579> in <module>()
1 from sklearn.model_selection import train_test_split
2
----> 3 X_train,X_test, y_train, y_test=train_test_split(X,y,stratify=y,test_size=0.3)
/Applications/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
2054 random_state=random_state)
2055
-> 2056 train, test = next(cv.split(X=arrays[0], y=stratify))
2057
2058 return list(chain.from_iterable((safe_indexing(a, train),
/Applications/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
1202 """
1203 X, y, groups = indexable(X, y, groups)
-> 1204 for train, test in self._iter_indices(X, y, groups):
1205 yield train, test
1206
/Applications/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in _iter_indices(self, X, y, groups)
1544 class_counts = np.bincount(y_indices)
1545 if np.min(class_counts) < 2:
-> 1546 raise ValueError("The least populated class in y has only 1"
1547 " member, which is too few. The minimum"
1548 " number of groups for any class cannot"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
当我 运行 使用不同数据的另一个机器学习项目的完全相同的一组行时,它工作正常。我做错了什么?
与正在考虑的数据框形状相关的其他信息:
print(data.shape)
print(X.shape)
print(y.shape)
输出:
(3047, 33)
(3047, 32)
(3047, 1)
由于您正在使用分层,属于每个 class 的样本数量需要在训练和测试中成比例。但是你的数据中有一个 class 只有一个样本。因此,无论是在训练中还是一次测试,这都会破坏分层选项。因此错误。
请参阅我的另一篇文章 ,其中描述了类似情况的示例。
我 运行 以下代码:
from sklearn.model_selection import train_test_split
X_train,X_test, y_train, y_test=train_test_split(X,y,stratify=y,test_size=0.3)
输出:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-b5740f8ae579> in <module>()
1 from sklearn.model_selection import train_test_split
2
----> 3 X_train,X_test, y_train, y_test=train_test_split(X,y,stratify=y,test_size=0.3)
/Applications/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
2054 random_state=random_state)
2055
-> 2056 train, test = next(cv.split(X=arrays[0], y=stratify))
2057
2058 return list(chain.from_iterable((safe_indexing(a, train),
/Applications/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
1202 """
1203 X, y, groups = indexable(X, y, groups)
-> 1204 for train, test in self._iter_indices(X, y, groups):
1205 yield train, test
1206
/Applications/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in _iter_indices(self, X, y, groups)
1544 class_counts = np.bincount(y_indices)
1545 if np.min(class_counts) < 2:
-> 1546 raise ValueError("The least populated class in y has only 1"
1547 " member, which is too few. The minimum"
1548 " number of groups for any class cannot"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
当我 运行 使用不同数据的另一个机器学习项目的完全相同的一组行时,它工作正常。我做错了什么?
与正在考虑的数据框形状相关的其他信息:
print(data.shape)
print(X.shape)
print(y.shape)
输出:
(3047, 33)
(3047, 32)
(3047, 1)
由于您正在使用分层,属于每个 class 的样本数量需要在训练和测试中成比例。但是你的数据中有一个 class 只有一个样本。因此,无论是在训练中还是一次测试,这都会破坏分层选项。因此错误。
请参阅我的另一篇文章