如何以简单明了的方式正确地将数据集拆分为训练验证测试集?
How to split dataset into train validate test sets correctly, in simple clear way?
我有一个包含 100
个样本的数据集,我想将其拆分为 75%
、25%
、25%
分别用于训练验证和测试,然后我想用不同的比率再次这样做,例如 80%
、10%
、10%
.
为此,我使用了 code
down,但我认为它在第二步中没有正确拆分数据,因为它将数据从 85%
拆分为 (85% x 85%)
和 (15% x 15%)
.
我的问题是:
对于任何给定的比率,是否有一种清晰明了的方法来以正确的方式进行拆分?
from sklearn.model_selection import train_test_split
# Split Train Test Validate
X_, X_val, Y_, Y_val = train_test_split(X, Y, test_size=0.15, random_state=42)
X_train, X_test, Y_train, Y_test = train_test_split(X_, Y_, test_size=0.15, random_state=42)
您始终可以手动完成。有点乱,但你可以创建一个函数
def my_train_test_split(X, y, ratio_train, ratio_val, seed=42):
idx = np.arange(X.shape[0])
np.random.seed(seed)
np.random.shuffle(idx)
limit_train = int(ratio_train * X.shape[0])
limit_val = int((ratio_train + ratio_val) * X.shape[0])
idx_train = idx[:limit_train]
idx_val = idx[limit_train:limit_val]
idx_test = idx[limit_val:]
X_train, y_train = X[idx_train], y[idx_train]
X_val, y_val = X[idx_val], y[idx_val]
X_test, y_test = X[idx_test], y[idx_test]
return X_train, X_val, X_test, y_train, y_val, y_test
比率检验假设为1-(ratio_train+ratio_val).
我有一个包含 100
个样本的数据集,我想将其拆分为 75%
、25%
、25%
分别用于训练验证和测试,然后我想用不同的比率再次这样做,例如 80%
、10%
、10%
.
为此,我使用了 code
down,但我认为它在第二步中没有正确拆分数据,因为它将数据从 85%
拆分为 (85% x 85%)
和 (15% x 15%)
.
我的问题是:
对于任何给定的比率,是否有一种清晰明了的方法来以正确的方式进行拆分?
from sklearn.model_selection import train_test_split
# Split Train Test Validate
X_, X_val, Y_, Y_val = train_test_split(X, Y, test_size=0.15, random_state=42)
X_train, X_test, Y_train, Y_test = train_test_split(X_, Y_, test_size=0.15, random_state=42)
您始终可以手动完成。有点乱,但你可以创建一个函数
def my_train_test_split(X, y, ratio_train, ratio_val, seed=42):
idx = np.arange(X.shape[0])
np.random.seed(seed)
np.random.shuffle(idx)
limit_train = int(ratio_train * X.shape[0])
limit_val = int((ratio_train + ratio_val) * X.shape[0])
idx_train = idx[:limit_train]
idx_val = idx[limit_train:limit_val]
idx_test = idx[limit_val:]
X_train, y_train = X[idx_train], y[idx_train]
X_val, y_val = X[idx_val], y[idx_val]
X_test, y_test = X[idx_test], y[idx_test]
return X_train, X_val, X_test, y_train, y_val, y_test
比率检验假设为1-(ratio_train+ratio_val).