我如何在 scikit learn 中训练测试拆分

Question

有谁知道是什么问题吗？

x=np.linspace(-3,3,100)
rng=np.random.RandomState(42)
y=np.sin(4*x)+x+rng.uniform(size=len(x))
X=x[:,np.newaxis]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.25,random_state=42,stratify=y)

我有这个错误：

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

Answer 1

尝试删除 stratify=y，你应该没有。另外，看看 here.

Answer 2

来自documentation：

3.1.2.2. Cross-validation iterators with stratification based on class labels.

Some classification problems can exhibit a large imbalance in the distribution of the target classes: for instance there could be several times more negative samples than positive samples. In such cases it is recommended to use stratified sampling as implemented in StratifiedKFold and StratifiedShuffleSplit to ensure that relative class frequencies is approximately preserved in each train and validation fold.

Answer 3

train_test_split 中的参数 (stratify = y) 给您错误。当您的标签具有重复值时使用分层。例如：假设您的标签列的值为 0 和 1。然后传递 stratify = y，将保留训练样本中标签的原始比例。比如说，如果你有 60% 的 1 和 40% 的 0，那么你的训练样本也会有相同的比例。

我如何在 scikit learn 中训练测试拆分

how can I train test split in scikit learn

python

scikit-learn

train-test-split